All of lore.kernel.org
 help / color / mirror / Atom feed
* task blocked for more than 120 seconds
@ 2012-04-18 15:11 Josef 'Jeff' Sipek
  2012-04-18 18:28 ` Ben Myers
  2012-04-18 23:48 ` Dave Chinner
  0 siblings, 2 replies; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-18 15:11 UTC (permalink / raw)
  To: xfs

Greetings!  I have a file server that get a pretty nasty load (about 15
million files created every day).  After some time, I noticed that the load
average spiked up from the usual 30 to about 180.  dmesg revealed:

[434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
[434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[434042.318405] php             D 000000010675d6cd     0  2185  27306 0x00000000
[434042.318408]  ffff88008d735a48 0000000000000086 ffff88008d735938 ffffffff00000000
[434042.318412]  ffff88008d734010 ffff88000e28e340 0000000000012000 ffff88008d735fd8
[434042.318416]  ffff88008d735fd8 0000000000012000 ffff8807ef9966c0 ffff88000e28e340
[434042.318419] Call Trace:
[434042.318442]  [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
[434042.318464]  [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
[434042.318485]  [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225 [xfs]
[434042.318489]  [<ffffffff8141481e>] schedule+0x55/0x57
[434042.318512]  [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
[434042.318515]  [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
[434042.318539]  [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
[434042.318562]  [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
[434042.318585]  [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
[434042.318605]  [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
[434042.318623]  [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
[434042.318640]  [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
[434042.318644]  [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
[434042.318647]  [<ffffffff810f1c5e>] do_last+0x302/0x642
[434042.318651]  [<ffffffff810f2068>] path_openat+0xca/0x344
[434042.318654]  [<ffffffff810f23d1>] do_filp_open+0x38/0x87
[434042.318658]  [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
[434042.318661]  [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
[434042.318664]  [<ffffffff810e4173>] sys_open+0x1b/0x1d

It makes sense that'd the load average would spike up if some major lock got
held longer than it should have been.

The box has 32GB RAM, 6 cores, and it's running 3.2.2.

I've looked at the commits in the stable tree since 3.2.2 was tagged, and I
do see a couple of useful commits so I'll try to get the kernel updated
anyway but I don't quite see any of those fixes addressing this "hang".

Thanks,

Jeff.

-- 
Research, n.:
  Consider Columbus:
    He didn't know where he was going.
    When he got there he didn't know where he was.
    When he got back he didn't know where he had been.
    And he did it all on someone else's money.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-18 15:11 task blocked for more than 120 seconds Josef 'Jeff' Sipek
@ 2012-04-18 18:28 ` Ben Myers
  2012-04-18 23:48 ` Dave Chinner
  1 sibling, 0 replies; 21+ messages in thread
From: Ben Myers @ 2012-04-18 18:28 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

Hey Josef,

On Wed, Apr 18, 2012 at 11:11:40AM -0400, Josef 'Jeff' Sipek wrote:
> Greetings!  I have a file server that get a pretty nasty load (about 15
> million files created every day).  After some time, I noticed that the load
> average spiked up from the usual 30 to about 180.  dmesg revealed:
> 
> [434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
> [434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [434042.318405] php             D 000000010675d6cd     0  2185  27306 0x00000000
> [434042.318408]  ffff88008d735a48 0000000000000086 ffff88008d735938 ffffffff00000000
> [434042.318412]  ffff88008d734010 ffff88000e28e340 0000000000012000 ffff88008d735fd8
> [434042.318416]  ffff88008d735fd8 0000000000012000 ffff8807ef9966c0 ffff88000e28e340
> [434042.318419] Call Trace:
> [434042.318442]  [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
> [434042.318464]  [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
> [434042.318485]  [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225 [xfs]
> [434042.318489]  [<ffffffff8141481e>] schedule+0x55/0x57
> [434042.318512]  [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
> [434042.318515]  [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
> [434042.318539]  [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
> [434042.318562]  [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
> [434042.318585]  [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
> [434042.318605]  [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
> [434042.318623]  [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
> [434042.318640]  [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
> [434042.318644]  [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
> [434042.318647]  [<ffffffff810f1c5e>] do_last+0x302/0x642
> [434042.318651]  [<ffffffff810f2068>] path_openat+0xca/0x344
> [434042.318654]  [<ffffffff810f23d1>] do_filp_open+0x38/0x87
> [434042.318658]  [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
> [434042.318661]  [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
> [434042.318664]  [<ffffffff810e4173>] sys_open+0x1b/0x1d
> 
> It makes sense that'd the load average would spike up if some major lock got
> held longer than it should have been.
> 
> The box has 32GB RAM, 6 cores, and it's running 3.2.2.
> 
> I've looked at the commits in the stable tree since 3.2.2 was tagged, and I
> do see a couple of useful commits so I'll try to get the kernel updated
> anyway but I don't quite see any of those fixes addressing this "hang".

I was about to suggest that 9f9c19e 'xfs: fix the logspace waiting algorithm'
is probably what you need... but that's already in 3.2.2.  Hrm.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-18 15:11 task blocked for more than 120 seconds Josef 'Jeff' Sipek
  2012-04-18 18:28 ` Ben Myers
@ 2012-04-18 23:48 ` Dave Chinner
  2012-04-19 15:46   ` Josef 'Jeff' Sipek
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-04-18 23:48 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

On Wed, Apr 18, 2012 at 11:11:40AM -0400, Josef 'Jeff' Sipek wrote:
> Greetings!  I have a file server that get a pretty nasty load (about 15
> million files created every day).  After some time, I noticed that the load
> average spiked up from the usual 30 to about 180.  dmesg revealed:
> 
> [434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
> [434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [434042.318405] php             D 000000010675d6cd     0  2185  27306 0x00000000
> [434042.318408]  ffff88008d735a48 0000000000000086 ffff88008d735938 ffffffff00000000
> [434042.318412]  ffff88008d734010 ffff88000e28e340 0000000000012000 ffff88008d735fd8
> [434042.318416]  ffff88008d735fd8 0000000000012000 ffff8807ef9966c0 ffff88000e28e340
> [434042.318419] Call Trace:
> [434042.318442]  [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
> [434042.318464]  [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
> [434042.318485]  [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225 [xfs]
> [434042.318489]  [<ffffffff8141481e>] schedule+0x55/0x57
> [434042.318512]  [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
> [434042.318515]  [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
> [434042.318539]  [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
> [434042.318562]  [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
> [434042.318585]  [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
> [434042.318605]  [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
> [434042.318623]  [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
> [434042.318640]  [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
> [434042.318644]  [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
> [434042.318647]  [<ffffffff810f1c5e>] do_last+0x302/0x642
> [434042.318651]  [<ffffffff810f2068>] path_openat+0xca/0x344
> [434042.318654]  [<ffffffff810f23d1>] do_filp_open+0x38/0x87
> [434042.318658]  [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
> [434042.318661]  [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
> [434042.318664]  [<ffffffff810e4173>] sys_open+0x1b/0x1d
> 
> It makes sense that'd the load average would spike up if some major lock got
> held longer than it should have been.

That's possibly just IO load. It's waiting for log space to become
available, and that will only occur when metadata is written back to
disk.

What's the storage subsystem, what IO scheduler, the actually
workload that is running at the time the spike occurs (e.g did
someone run a directory traversal that changed every indoe?), and so
on.

Also, getting the output of 'echo w > /proc/sysrq-trigger' whould be
helpful here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-18 23:48 ` Dave Chinner
@ 2012-04-19 15:46   ` Josef 'Jeff' Sipek
  2012-04-19 22:56     ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-19 15:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Apr 19, 2012 at 09:48:21AM +1000, Dave Chinner wrote:
> On Wed, Apr 18, 2012 at 11:11:40AM -0400, Josef 'Jeff' Sipek wrote:
> > Greetings!  I have a file server that get a pretty nasty load (about 15
> > million files created every day).  After some time, I noticed that the load
> > average spiked up from the usual 30 to about 180.  dmesg revealed:
> > 
> > [434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
> > [434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [434042.318405] php             D 000000010675d6cd     0  2185  27306 0x00000000
> > [434042.318408]  ffff88008d735a48 0000000000000086 ffff88008d735938 ffffffff00000000
> > [434042.318412]  ffff88008d734010 ffff88000e28e340 0000000000012000 ffff88008d735fd8
> > [434042.318416]  ffff88008d735fd8 0000000000012000 ffff8807ef9966c0 ffff88000e28e340
> > [434042.318419] Call Trace:
> > [434042.318442]  [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
> > [434042.318464]  [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
> > [434042.318485]  [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225 [xfs]
> > [434042.318489]  [<ffffffff8141481e>] schedule+0x55/0x57
> > [434042.318512]  [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
> > [434042.318515]  [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
> > [434042.318539]  [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
> > [434042.318562]  [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
> > [434042.318585]  [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
> > [434042.318605]  [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
> > [434042.318623]  [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
> > [434042.318640]  [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
> > [434042.318644]  [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
> > [434042.318647]  [<ffffffff810f1c5e>] do_last+0x302/0x642
> > [434042.318651]  [<ffffffff810f2068>] path_openat+0xca/0x344
> > [434042.318654]  [<ffffffff810f23d1>] do_filp_open+0x38/0x87
> > [434042.318658]  [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
> > [434042.318661]  [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
> > [434042.318664]  [<ffffffff810e4173>] sys_open+0x1b/0x1d
> > 
> > It makes sense that'd the load average would spike up if some major lock got
> > held longer than it should have been.
> 
> That's possibly just IO load. It's waiting for log space to become
> available, and that will only occur when metadata is written back to
> disk.

Yeah, it could be...

> What's the storage subsystem, what IO scheduler, the actually
> workload that is running at the time the spike occurs (e.g did
> someone run a directory traversal that changed every indoe?), and so
> on.

fs3.ess ~ # MegaCli -CfgDsply -aAll
==============================================================================
Adapter: 0
Product Name: LSI MegaRAID SAS 9260-16i
Memory: 512MB
BBU: Present
...
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 32.742 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives per span:8
Span Depth          : 2
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
...

fs3.ess ~ # cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/root / ext4 rw,noatime,user_xattr,barrier=1,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
udev /dev tmpfs rw,nosuid,relatime,size=10240k,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620 0 0
none /dev/shm tmpfs rw,relatime 0 0
usbfs /proc/bus/usb usbfs rw,nosuid,noexec,relatime,devgid=85,devmode=664 0 0
/dev/sda1 /boot ext3 rw,noatime,errors=continue,user_xattr,acl,barrier=1,data=writeback 0 0
/dev/sda4 /var/data/disk0 xfs rw,noatime,attr2,filestreams,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota 0 0

fs3.ess ~ # xfs_info /var/data/disk0/
meta-data=/dev/sda4              isize=256    agcount=32, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=8519632640, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

As far as the workload is concerned, there's the typical (for us) load of
create 15 million files in a day (pretty standard diurnal distribution as far
as the load is concerned).  Half the files are about 300 bytes, and the other
half averages 20kB in size.  On this system, the files are never read back.
Since we had a fs corruption recently (due to a power failure & disk caches
being on), our software managed to dump about 25 million files onto /.  I've
been rsync'ing them to the data partition at the same time (yes, it's slow
because they are two partitions on the same array).  The rsync aside, we've had
this workload going for a month on this particular server without any issues.
(The other server which is set up identically has been fine too.)  On Monday,
we mkfs'd this server's sda4, mounted it as before and started the workload.
Then, on Tuesday and Wednesday, we saw two loadavg spikes for no apparent
reason.  (Our system forks a bunch and then each child does network & disk I/O,
so 180 loadavg makes sense if the system gums up.)

The following applies to all of the spikes, but I'm specifically talking about
the spike from this morning.  During the ~45 minute spike, there seems to be
very little disk I/O (<1 MByte/s compared to the usual 10 MBytes/s).  Since
all the I/O children get stuck waiting for disk (ps says they are in 'D'
state), the network utilization drops to essentially 0.  The CPU usage also
drops to ~0%.

I *think* things return to normal because our code eventually notices that
the children are stuck and it 'kill -9's them.  (It's nicer at first.)

> Also, getting the output of 'echo w > /proc/sysrq-trigger' whould be
> helpful here....

I assume you want this when the loadavg spikes up to 180.  /me waits for the
next time it happens.

Thanks, let me know if I forgot something.

Jeff.

-- 
All science is either physics or stamp collecting.
		- Ernest Rutherford

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-19 15:46   ` Josef 'Jeff' Sipek
@ 2012-04-19 22:56     ` Dave Chinner
  2012-04-20 13:58       ` Josef 'Jeff' Sipek
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2012-04-19 22:56 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

On Thu, Apr 19, 2012 at 11:46:02AM -0400, Josef 'Jeff' Sipek wrote:
> On Thu, Apr 19, 2012 at 09:48:21AM +1000, Dave Chinner wrote:
> > On Wed, Apr 18, 2012 at 11:11:40AM -0400, Josef 'Jeff' Sipek wrote:
> > > Greetings!  I have a file server that get a pretty nasty load (about 15
> > > million files created every day).  After some time, I noticed that the load
> > > average spiked up from the usual 30 to about 180.  dmesg revealed:
> > > 
> > > [434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
> > > [434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [434042.318405] php             D 000000010675d6cd     0  2185  27306 0x00000000
> > > [434042.318408]  ffff88008d735a48 0000000000000086 ffff88008d735938 ffffffff00000000
> > > [434042.318412]  ffff88008d734010 ffff88000e28e340 0000000000012000 ffff88008d735fd8
> > > [434042.318416]  ffff88008d735fd8 0000000000012000 ffff8807ef9966c0 ffff88000e28e340
> > > [434042.318419] Call Trace:
> > > [434042.318442]  [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
> > > [434042.318464]  [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
> > > [434042.318485]  [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225 [xfs]
> > > [434042.318489]  [<ffffffff8141481e>] schedule+0x55/0x57
> > > [434042.318512]  [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
> > > [434042.318515]  [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
> > > [434042.318539]  [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
> > > [434042.318562]  [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
> > > [434042.318585]  [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
> > > [434042.318605]  [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
> > > [434042.318623]  [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
> > > [434042.318640]  [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
> > > [434042.318644]  [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
> > > [434042.318647]  [<ffffffff810f1c5e>] do_last+0x302/0x642
> > > [434042.318651]  [<ffffffff810f2068>] path_openat+0xca/0x344
> > > [434042.318654]  [<ffffffff810f23d1>] do_filp_open+0x38/0x87
> > > [434042.318658]  [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
> > > [434042.318661]  [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
> > > [434042.318664]  [<ffffffff810e4173>] sys_open+0x1b/0x1d
> > > 
> > > It makes sense that'd the load average would spike up if some major lock got
> > > held longer than it should have been.
> > 
> > That's possibly just IO load. It's waiting for log space to become
> > available, and that will only occur when metadata is written back to
> > disk.
> 
> Yeah, it could be...
> 
> > What's the storage subsystem, what IO scheduler, the actually
> > workload that is running at the time the spike occurs (e.g did
> > someone run a directory traversal that changed every indoe?), and so
> > on.
> 
> fs3.ess ~ # MegaCli -CfgDsply -aAll
> ==============================================================================
> Adapter: 0
> Product Name: LSI MegaRAID SAS 9260-16i
> Memory: 512MB
> BBU: Present
> ...
> RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
....
> /dev/sda4 /var/data/disk0 xfs rw,noatime,attr2,filestreams,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota 0 0

Only unusual thing is the filestreams allocator is in use...

> fs3.ess ~ # xfs_info /var/data/disk0/
> meta-data=/dev/sda4              isize=256    agcount=32, agsize=268435455 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=8519632640, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2

and it is a large log, so that's a definite candidate for long
stalls pushing the tail of the log. The typical push that you'll get
stalled on is trying to keep 25% fthe log space free, so there's
potentially hundreds of megabytes of random 4/8k metadata writes to
be done to free that space in the log...

>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> As far as the workload is concerned, there's the typical (for us) load of
> create 15 million files in a day (pretty standard diurnal distribution as far
> as the load is concerned).  Half the files are about 300 bytes, and the other
> half averages 20kB in size.  On this system, the files are never read back.
> Since we had a fs corruption recently (due to a power failure & disk caches
> being on), our software managed to dump about 25 million files onto /.  I've
> been rsync'ing them to the data partition at the same time (yes, it's slow
> because they are two partitions on the same array).  The rsync aside, we've had
> this workload going for a month on this particular server without any issues.
> (The other server which is set up identically has been fine too.)  On Monday,
> we mkfs'd this server's sda4, mounted it as before and started the workload.

What was the size of the log on the previous incarnation of the
filesystem?

> Then, on Tuesday and Wednesday, we saw two loadavg spikes for no apparent
> reason.  (Our system forks a bunch and then each child does network & disk I/O,
> so 180 loadavg makes sense if the system gums up.)
> 
> The following applies to all of the spikes, but I'm specifically talking about
> the spike from this morning.  During the ~45 minute spike, there seems to be
> very little disk I/O (<1 MByte/s compared to the usual 10 MBytes/s).  Since

That sounds like it might have dropped into random 4k write IO or
inode cluster RMW cycles - a single large RAID6 volume is going to
be no faster than a single spindle at this. Can you get `iostat -d
-m -x 5` output when the next slowdown occurs so we can see the IOPS
and utilisation as well as the bandwidth?

> all the I/O children get stuck waiting for disk (ps says they are in 'D'
> state), the network utilization drops to essentially 0.  The CPU usage also
> drops to ~0%.
> 
> I *think* things return to normal because our code eventually notices that
> the children are stuck and it 'kill -9's them.  (It's nicer at first.)

you can't kill processes stuck waiting for log space IIRC.

> > Also, getting the output of 'echo w > /proc/sysrq-trigger' whould be
> > helpful here....
> 
> I assume you want this when the loadavg spikes up to 180.  /me waits for the
> next time it happens.

Yup.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-19 22:56     ` Dave Chinner
@ 2012-04-20 13:58       ` Josef 'Jeff' Sipek
  2012-04-21  0:29         ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-20 13:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Apr 20, 2012 at 08:56:03AM +1000, Dave Chinner wrote:
> On Thu, Apr 19, 2012 at 11:46:02AM -0400, Josef 'Jeff' Sipek wrote:
...
> > /dev/sda4 /var/data/disk0 xfs rw,noatime,attr2,filestreams,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,noquota 0 0
> 
> Only unusual thing is the filestreams allocator is in use...

Yep.  It seemed to make sense to use it since we just want the files to get
to disk.

> > fs3.ess ~ # xfs_info /var/data/disk0/
> > meta-data=/dev/sda4              isize=256    agcount=32, agsize=268435455 blks
> >          =                       sectsz=512   attr=2
> > data     =                       bsize=4096   blocks=8519632640, imaxpct=5
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0
> > log      =internal               bsize=4096   blocks=521728, version=2
> 
> and it is a large log, so that's a definite candidate for long
> stalls pushing the tail of the log. The typical push that you'll get
> stalled on is trying to keep 25% fthe log space free, so there's
> potentially hundreds of megabytes of random 4/8k metadata writes to
> be done to free that space in the log...

FWIW, the filesystem was made using the default mkfs options (IOW: mkfs.xfs
/dev/sda4).  I didn't even notice the log being 2GB.  (I just assumed it'd
be the good ol' 128MB.)

> What was the size of the log on the previous incarnation of the
> filesystem?

I don't know.  I checked the other file server that was set up at the same
time, and it has a log that's the same size.

...
> > The following applies to all of the spikes, but I'm specifically talking about
> > the spike from this morning.  During the ~45 minute spike, there seems to be
> > very little disk I/O (<1 MByte/s compared to the usual 10 MBytes/s).  Since
> 
> That sounds like it might have dropped into random 4k write IO or
> inode cluster RMW cycles - a single large RAID6 volume is going to
> be no faster than a single spindle at this. Can you get `iostat -d
> -m -x 5` output when the next slowdown occurs so we can see the IOPS
> and utilisation as well as the bandwidth?

Right.  Here's the output from a few minutes ago (I removed the empty lines
and redundant column headings):

fs3.ess ~ # iostat -d -m -x 5
Linux 3.2.2 (fs3.ess.sfj.cudaops.com) 	04/20/12 	_x86_64_	(6 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              11.90   129.18  261.81  457.22     2.81     6.76    27.26     4.44    6.18    9.60    4.22   0.82  59.02
sda               0.00     0.20   61.20  112.60     0.49     0.88    16.17     1.50    8.64   17.66    3.74   5.74  99.76
sda               0.00     0.60   61.40  117.60     0.48     0.92    16.02     1.33    7.41   16.18    2.82   5.56  99.52
sda               0.00     1.20   63.40  104.20     0.50     0.82    16.09     1.28    7.65   15.70    2.75   5.94  99.60
sda               0.00    43.00   59.20  166.60     0.46     1.64    19.05     1.63    7.23   16.82    3.82   4.41  99.60
sda               0.00     0.40   60.80  139.80     0.47     1.09    16.00     1.32    6.57   16.25    2.36   4.96  99.44
sda               0.00     1.20   59.60  108.00     0.47     0.85    16.08     1.22    7.34   16.82    2.10   5.95  99.68
sda               0.00     0.40   66.60  102.80     0.52     0.80    16.01     1.28    7.57   15.00    2.75   5.88  99.68
sda               0.00     4.80   67.00  111.60     0.52     0.91    16.39     1.21    6.75   14.85    1.88   5.58  99.68
sda               0.00     0.60   64.60  103.40     0.50     0.81    16.02     1.16    6.90   15.48    1.55   5.94  99.76
sda               0.00    23.00   65.20  122.60     0.51     1.14    17.96     1.34    7.13   15.28    2.80   5.31  99.68
sda               0.00     0.60   62.40  153.60     0.49     1.20    16.01     1.40    6.46   15.88    2.63   4.63  99.92
sda               0.00    13.60   56.20  131.20     0.44     1.13    17.13     1.39    7.43   17.81    2.98   5.31  99.52
sda               0.00    18.80   29.80  422.00     0.23     3.44    16.66    20.54   45.46   33.40   46.31   2.20  99.52
sda               0.00     0.40   80.40    2.00     0.63     0.01    15.98     0.99   12.09   12.39    0.00  12.06  99.36
sda               0.00     3.60   70.20   12.60     0.55     0.12    16.64     1.00   11.71   13.81    0.00  12.04  99.68
sda               0.00    28.20   66.60   40.40     0.52     0.54    20.22     1.12   10.68   15.33    3.01   9.29  99.36
sda               0.00     0.80   71.80   89.00     0.56     0.70    16.04     1.24    7.66   13.77    2.72   6.21  99.84
sda               0.00     3.60   56.60  156.00     0.44     1.24    16.24     1.51    7.10   17.58    3.29   4.69  99.68
sda               0.00     0.40   64.80  106.80     0.51     0.84    16.01     1.18    6.93   15.56    1.69   5.82  99.84
sda               0.00     4.60   22.80  439.40     0.18     3.46    16.14    25.95   56.14   43.68   56.78   2.16  99.68
sda               0.00    28.20   70.60   87.60     0.57     0.90    19.02     1.33    8.43   14.29    3.71   6.29  99.52
sda               0.00     0.60   83.60    4.00     0.65     0.03    16.02     0.99   11.33   11.88    0.00  11.33  99.28
sda               0.00    13.20   80.00   44.40     0.62     0.23    14.10     1.00    8.08   12.45    0.20   8.00  99.52
sda               0.00     1.00   74.60   41.40     0.58     0.33    16.06     1.08    9.23   13.22    2.05   8.58  99.52
sda               0.00     0.40   63.00   95.20     0.49     0.74    16.01     1.28    8.16   15.89    3.05   6.28  99.36
sda               0.00     5.00   22.60  482.60     0.18     3.81    16.15     9.94   19.65   43.68   18.53   1.98 100.00
sda               0.00     0.20   75.40   10.60     0.59     0.08    15.98     1.00   11.75   13.40    0.00  11.57  99.52
sda               0.00     0.00   78.00    1.60     0.61     0.01    16.00     1.00   12.50   12.76    0.00  12.53  99.76
sda               0.00    14.40   75.80   52.40     0.59     0.52    17.75     1.15    8.95   13.12    2.93   7.76  99.44
sda               0.00    36.40   70.60  101.80     0.55     1.08    19.34     1.29    7.47   14.11    2.86   5.77  99.44
sda               0.00     0.60   63.80  143.40     0.50     1.12    16.02     1.49    7.18   15.59    3.44   4.81  99.60
sda               0.00    32.80   55.40  197.00     0.43     1.79    18.05     1.67    6.62   17.92    3.44   3.94  99.44
sda               0.00     2.20   62.60  132.20     0.49     1.03    16.02     1.38    7.08   15.96    2.88   5.12  99.76
sda               0.00     6.20   64.20  122.40     0.50     1.00    16.49     1.35    7.20   15.45    2.88   5.34  99.60
sda               0.00     0.00   62.60  111.20     0.49     0.87    16.00     1.28    7.38   16.00    2.53   5.74  99.76
sda               0.00     7.20   58.20  121.60     0.45     1.00    16.61     1.16    6.27   16.54    1.36   5.53  99.36
sda               0.00     0.20   68.20  104.40     0.53     0.81    15.99     1.33    7.88   15.10    3.16   5.78  99.76
sda               0.00    14.60   62.40  122.40     0.49     1.07    17.23     1.49    8.08   16.00    4.04   5.40  99.76


And now while the system is coming back to the usual loadavg (of 30):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    26.00   29.00 1235.40     0.13    17.10    27.91   202.30  161.37 1792.52  123.08   0.79 100.00
sda               0.00    29.80   26.20 1095.60     0.11    15.34    28.22   185.88  187.19 2248.92  137.89   0.89 100.00
sda               0.00    28.20   11.80 1527.20     0.06    19.80    26.44   165.23  112.77  665.83  108.50   0.65 100.00
sda               0.20     2.80  412.40  320.20     2.09     7.00    25.40    61.10   87.40   53.49  131.06   1.37 100.00
sda               2.60     1.60 1056.00   52.20     5.82     0.57    11.81    21.87   19.68   20.65    0.03   0.90 100.00
sda               0.00     7.40 1256.40   34.40     5.77     0.30     9.63    24.81   19.33   19.85    0.19   0.77 100.00
sda               0.00    28.80  917.60  273.00     4.18     2.42    11.34    26.72   22.38   27.52    5.12   0.84 100.00
sda               0.00     0.20 1302.00   28.20     5.90     0.37     9.64    24.75   18.69   19.09    0.03   0.75 100.00
sda               0.00    42.60   61.40  964.60     0.30    11.75    24.05   158.59  144.06  323.64  132.63   0.97 100.00
sda               0.00    29.40   24.00  675.00     0.11     5.00    14.96   143.77  209.05 1007.17  180.67   1.43 100.00
sda               0.00    21.60   28.00  785.00     0.16     6.38    16.48   177.47  220.53  930.09  195.22   1.23 100.00
sda               0.00    31.00   24.40  964.20     0.13    10.67    22.36   177.09  177.76  767.02  162.85   1.01 100.00
sda               0.00    27.00   24.80 1174.20     0.11    14.33    24.68   170.26  144.93  911.90  128.73   0.83 100.00
sda               0.00    34.40    9.00  700.80     0.05     2.98     8.73   151.54  212.61  797.51  205.09   1.41 100.00
sda               0.00    37.20    9.00  776.60     0.05     4.09    10.78   152.84  192.65  775.91  185.89   1.27 100.00
sda               0.00    36.60   14.80  716.40     0.07     3.35     9.58   167.69  214.14  769.19  202.68   1.37 100.00
sda               0.00     5.80   54.00 1824.00     0.24    25.04    27.56   200.50  107.98  950.50   83.04   0.53 100.00
sda               0.00     1.80  207.60 1659.60     0.87    36.25    40.72   151.21   89.22  316.71   60.77   0.54 100.00
sda               0.00     2.00 1513.20   33.20     6.96     0.27     9.57    27.73   17.94   18.33    0.10   0.65 100.00
sda               0.00     1.60 1257.00   14.00     5.78     0.68    10.41    24.57   19.34   19.54    1.54   0.79 100.00
sda               0.00    14.20 1214.00  122.80     5.51     5.54    16.93    23.34   17.42   19.05    1.29   0.75 100.00
sda               0.00    13.60  892.80  590.40     4.12     2.67     9.38    25.95   16.39   23.63    5.45   0.67 100.00
sda               0.00    22.00   31.00  775.40     0.15     3.64     9.60    96.49  115.41  671.79   93.16   1.24 100.00
sda               0.00    29.40   29.60  679.40     0.14     4.54    13.52   153.30  215.36  860.62  187.25   1.41 100.00
sda               0.00    20.60   23.00  635.60     0.11     3.06     9.85   136.16  198.69  772.73  177.92   1.52 100.00
sda               0.00    21.40   21.00  909.20     0.10     7.39    16.49   178.96  196.48 1222.10  172.79   1.08 100.00
sda               0.00     6.00   51.80 1457.20     0.23    26.23    35.91   200.49  124.56  666.36  105.30   0.66 100.00
sda               0.00    38.60   29.60  701.00     0.12     3.23     9.40   182.10  267.15 1711.57  206.16   1.37 100.00

Oh, and:

fs3.ess ~ # cat /sys/block/sda/queue/scheduler 
noop [deadline] cfq 

Jeff.

-- 
You measure democracy by the freedom it gives its dissidents, not the
freedom it gives its assimilated conformists.
		- Abbie Hoffman

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-20 13:58       ` Josef 'Jeff' Sipek
@ 2012-04-21  0:29         ` Dave Chinner
  2012-04-23 17:16           ` Josef 'Jeff' Sipek
  2012-04-23 20:24           ` Josef 'Jeff' Sipek
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2012-04-21  0:29 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

On Fri, Apr 20, 2012 at 09:58:20AM -0400, Josef 'Jeff' Sipek wrote:
> On Fri, Apr 20, 2012 at 08:56:03AM +1000, Dave Chinner wrote:
> > On Thu, Apr 19, 2012 at 11:46:02AM -0400, Josef 'Jeff' Sipek wrote:
> FWIW, the filesystem was made using the default mkfs options (IOW: mkfs.xfs
> /dev/sda4).  I didn't even notice the log being 2GB.  (I just assumed it'd
> be the good ol' 128MB.)

OK.

> ...
> > > The following applies to all of the spikes, but I'm specifically talking about
> > > the spike from this morning.  During the ~45 minute spike, there seems to be
> > > very little disk I/O (<1 MByte/s compared to the usual 10 MBytes/s).  Since
> > 
> > That sounds like it might have dropped into random 4k write IO or
> > inode cluster RMW cycles - a single large RAID6 volume is going to
> > be no faster than a single spindle at this. Can you get `iostat -d
> > -m -x 5` output when the next slowdown occurs so we can see the IOPS
> > and utilisation as well as the bandwidth?
> 
> Right.  Here's the output from a few minutes ago (I removed the empty lines
> and redundant column headings):

Thanks, that makes it easy to read ;)

> fs3.ess ~ # iostat -d -m -x 5
> Linux 3.2.2 (fs3.ess.sfj.cudaops.com) 	04/20/12 	_x86_64_	(6 CPU)
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda              11.90   129.18  261.81  457.22     2.81     6.76    27.26     4.44    6.18    9.60    4.22   0.82  59.02
> sda               0.00     0.20   61.20  112.60     0.49     0.88    16.17     1.50    8.64   17.66    3.74   5.74  99.76
> sda               0.00     0.60   61.40  117.60     0.48     0.92    16.02     1.33    7.41   16.18    2.82   5.56  99.52
> sda               0.00     1.20   63.40  104.20     0.50     0.82    16.09     1.28    7.65   15.70    2.75   5.94  99.60

Ok, so the queue depth here is showing serialised IO (barely above
1), and the request size is 8k, which will be inode clusters. The
service time for reads is around 16ms, which assuming SATA drives is
a full disk seek. The writes are much faster dues to the BBWC. it
looks, however, like it is reading inodes from all over the drive,
and that is the delay factor here. The serialisation indicates that
it is either log tail pushing or low-memory inode reclaim that is
slowing everyone down here.

Now comes the "more data needed" bit. I still need the sysrq-w
output, but also given the length of the incident, some other data
is definitely needed:

	- a 30s event trace - it'll compress pretty well
	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)
	- vmstat 1 output for the same period
	- output of /proc/meminfo at the same time
	- the same iostat output for comparison.

> And now while the system is coming back to the usual loadavg (of 30):
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00    26.00   29.00 1235.40     0.13    17.10    27.91   202.30  161.37 1792.52  123.08   0.79 100.00
> sda               0.00    29.80   26.20 1095.60     0.11    15.34    28.22   185.88  187.19 2248.92  137.89   0.89 100.00
> sda               0.00    28.20   11.80 1527.20     0.06    19.80    26.44   165.23  112.77  665.83  108.50   0.65 100.00

Yeah, that looks more like a properly working filesystem :/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-21  0:29         ` Dave Chinner
@ 2012-04-23 17:16           ` Josef 'Jeff' Sipek
  2012-04-23 20:24           ` Josef 'Jeff' Sipek
  1 sibling, 0 replies; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-23 17:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Sat, Apr 21, 2012 at 10:29:32AM +1000, Dave Chinner wrote:
> On Fri, Apr 20, 2012 at 09:58:20AM -0400, Josef 'Jeff' Sipek wrote:
...
> > Right.  Here's the output from a few minutes ago (I removed the empty lines
> > and redundant column headings):
> 
> Thanks, that makes it easy to read ;)

Yeah, I don't know who's brilliant idea it was to output the headings every
time.

> Now comes the "more data needed" bit. I still need the sysrq-w
> output,

D'oh!  Totally forgot to put that up somewhere.

http://31bits.net/download/xfs_log.gz

Sadly, it looks like the kernel's ring buffer wrapped while dumping the ~180
stacks.

> but also given the length of the incident, some other data
> is definitely needed:
> 
> 	- a 30s event trace - it'll compress pretty well
> 	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)

Hrm... Time to figure out how to get ftrace going on Gentoo.  I don't see an
obvious package to install to get trace-cmd.

> 	- vmstat 1 output for the same period
> 	- output of /proc/meminfo at the same time
> 	- the same iostat output for comparison.

Jeff.

-- 
I'm somewhere between geek and normal.
		- Linus Torvalds

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-21  0:29         ` Dave Chinner
  2012-04-23 17:16           ` Josef 'Jeff' Sipek
@ 2012-04-23 20:24           ` Josef 'Jeff' Sipek
  2012-04-23 23:27             ` Dave Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-23 20:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Sat, Apr 21, 2012 at 10:29:32AM +1000, Dave Chinner wrote:
...
> but also given the length of the incident, some other data is definitely
> needed:
> 	- a 30s event trace - it'll compress pretty well
> 	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)

http://31bits.net/download/output-1335211829.txt.bz2

> 	- vmstat 1 output for the same period

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0 191064 321700 16540692    0    0   471  1131    1    2  3  2 81 15
 0  0      0 190248 321700 16540696    0    0   624   744  615  289  0  0 100  0
 0  0      0 188728 321708 16540700    0    0   568   732 1544 2427  0  1 99  0
 0  0      0 186848 321708 16540772    0    0   544     0 5789 11321  0  1 99  0
 0  0      0 185856 321708 16540928    0    0   556   728 5801 11318  0  1 99  0
 0  0      0 185608 321708 16541032    0    0   504   704 5952 11360  0  2 98  0
 0  0      0 184740 321708 16541188    0    0   488   792 5799 11293  0  1 99  0
 0  0      0 183996 321708 16541248    0    0   488   608 5937 11369  0  1 99  0
 0  0      0 183004 321716 16541420    0    0   664   504 5834 11396  0  1 99  0
 0  0      0 182136 321716 16541560    0    0   656     0 5790 11332  0  2 98  0
 0  0      0 179408 321716 16544120    0    0   944  1072 5877 11428  0  1 98  0
 1  0      0 177844 321716 16544828    0    0   232 11448 5996 11443  0  1 98  0
 0  0      0 177968 321716 16544944    0    0     0     0 5703 11177  0  1 98  0
 0  0      0 177472 321724 16545028    0    0   264    20 5769 11264  1  1 99  0
 0  0      0 176852 321724 16545116    0    0   672     0 5827 11355  0  2 98  0
 0  0      0 176140 321724 16545164    0    0   656     0 5769 11339  0  1 99  0
 0  0      0 175396 321724 16545304    0    0   664    24 5797 11320  1  2 98  0
 0  0      0 173024 321724 16545408    0    0   668    40 5932 11393  1  1 98  0
 0  0      0 172280 321732 16545504    0    0   544    44 5761 11309  0  1 99  0
 0  0      0 171660 321732 16545632    0    0   632     8 5782 11336  0  1 99  0
 0  0      0 171040 321732 16546000    0    0   648   176 5796 11330  0  1 99  0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 167320 321732 16548276    0    0  1312     0 5864 11479  0  1 99  0
 0  0      0 165832 321732 16548452    0    0   568  1224 5883 11359  0  1 99  0
 0  0      0 164716 321740 16548524    0    0   640   144 5871 11346  0  2 98  0
 0  0      0 163600 321740 16549920    0    0  1048   576 5876 11411  0  1 99  0
 0  0      0 161988 321740 16551356    0    0   944  1000 5849 11402  0  1 99  0
 0  0      0 161368 321740 16551964    0    0   624    48 5788 11324  0  2 98  0
 0  0      0 160996 321740 16552044    0    0   584     0 5772 11339  0  1 99  0
 0  0      0 160252 321748 16552136    0    0   624    20 5741 11327  0  0 100  0
 0  0      0 165708 321736 16546552    0    0   752   456 5895 11382  0  1 98  0
 0  0      0 164468 321736 16546952    0    0   664   376 5822 11326  0  1 99  0
 0  0      0 163724 321736 16547180    0    0   608   152 5800 11336  0  1 99  0
 0  0      0 163600 321736 16547260    0    0   672   872 4851 9169  0  1 99  0
 0  0      0 162880 321736 16547336    0    0   536     0  457  232  0  0 100  0
 1  0      0 164880 321740 16537360    0    0   376 16020  616  317  5  2 93  0
 1  0      0 167740 321368 16536716    0    0   496   888  597  258 17  0 83  0
 0  0      0 176048 317916 16540552    0    0   480   752  629  248 14  1 85  0
 0  0      0 175684 317916 16540448    0    0   472   800  230  216  0  0 100  0
 0  0      0 175328 317916 16540448    0    0   592   560  217  249  0  0 100  0
 0  0      0 174632 317916 16540488    0    0   600     0  254  257  0  0 100  0
 0  0      0 174508 317924 16540476    0    0   528   364  234  289  0  0 100  0
 0  0      0 173764 317924 16540480    0    0   584   640  290  280  0  0 100  0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 173392 317924 16540480    0    0   736  1088  310  324  0  0 100  0
 0  0      0 172720 317924 16540480    0    0   608   328  279  301  0  0 100  0
 0  0      0 172100 317924 16540480    0    0   680   744  270  309  0  0 100  0
 0  0      0 171728 317932 16540480    0    0   504    44  197  218  0  0 100  0
 0  0      0 172852 317932 16540484    0    0   504   776  354  270  0  0 100  0
 0  0      0 172380 317932 16540488    0    0   432   680  239  217  0  0 100  0
 0  0      0 171760 317932 16540484    0    0   552   616  208  221  0  0 100  0
 0  0      0 171388 317932 16540484    0    0   528   816  212  229  0  0 100  0

> 	- output of /proc/meminfo at the same time

MemTotal:       33017676 kB
MemFree:          191312 kB
Buffers:          321700 kB
Cached:         16540692 kB
SwapCached:            0 kB
Active:         15167228 kB
Inactive:        9247756 kB
Active(anon):    6736980 kB
Inactive(anon):   817368 kB
Active(file):    8430248 kB
Inactive(file):  8430388 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3808252 kB
SwapFree:        3808252 kB
Dirty:            434824 kB
Writeback:             0 kB
AnonPages:       7552692 kB
Mapped:            10380 kB
Shmem:              1740 kB
Slab:            3689788 kB
SReclaimable:    2628672 kB
SUnreclaim:      1061116 kB
KernelStack:        2752 kB
PageTables:        58084 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    20317088 kB
Committed_AS:    9698260 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      329288 kB
VmallocChunk:   34359405376 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        6720 kB
DirectMap2M:     3137536 kB
DirectMap1G:    30408704 kB

> 	- the same iostat output for comparison.

Linux 3.2.2 (fs3.ess.sfj.cudaops.com) 	04/23/12 	_x86_64_	(6 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               8.89    95.35  306.14  470.40     2.75     6.60    24.67     3.52    4.54    5.62    3.83   0.91  70.84
sda               0.00     2.00   70.20   71.80     0.55     0.57    16.07     1.02    7.19   14.54    0.01   7.03  99.84
sda               0.00     2.40   80.80   73.40     0.63     0.58    16.10     1.02    6.58   12.16    0.45   6.39  98.48
sda               0.00    12.20   45.80  275.00     0.36     2.24    16.58     3.96   12.36   21.87   10.78   3.11  99.92
sda               0.00     2.00   78.60    6.40     0.61     0.06    16.19     1.00   11.76   12.71    0.00  11.71  99.52
sda               0.00    30.20  113.00   44.00     0.88     0.57    19.02     0.99    6.29    8.74    0.00   6.30  98.88
sda               0.00     2.40   81.20   20.60     0.63     0.18    16.30     1.00    9.80   12.29    0.00   9.80  99.76
sda               0.00     1.20   67.20   80.00     0.53     3.50    56.03     1.13    7.67   14.88    1.62   6.78  99.84
sda               0.00     2.60   66.80   60.80     0.52     0.48    16.14     1.03    8.08   14.92    0.57   7.82  99.84
sda               0.00     8.00   77.60   64.20     0.61     0.56    16.78     1.01    7.11   12.86    0.16   7.03  99.68

Anything else?

Jeff.

P.S. I had a script prepared to fire off whenever the time came:

# cat run-when-loaded.sh 
#!/bin/sh

TOK=`date +%s`

cat /proc/mounts > ~/mounts-$TOK.txt
cat /proc/meminfo > ~/meminfo-$TOK.txt

(
	vmstat 1 50 > ~/vmstat-$TOK.txt
) &

(
	iostat -d -m -x 5 10 > ~/iostat-$TOK.txt
) &

(
	cd trace-cmd
	./trace-cmd record -e xfs* sleep 30
	./trace-cmd report > ~/output-$TOK.txt
)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-23 20:24           ` Josef 'Jeff' Sipek
@ 2012-04-23 23:27             ` Dave Chinner
  2012-04-24 15:10               ` Josef 'Jeff' Sipek
  2012-09-27 12:49               ` Josef 'Jeff' Sipek
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2012-04-23 23:27 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

On Mon, Apr 23, 2012 at 04:24:41PM -0400, Josef 'Jeff' Sipek wrote:
> On Sat, Apr 21, 2012 at 10:29:32AM +1000, Dave Chinner wrote:
> ...
> > but also given the length of the incident, some other data is definitely
> > needed:
> > 	- a 30s event trace - it'll compress pretty well
> > 	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)
> 
> http://31bits.net/download/output-1335211829.txt.bz2

Ok, that's instructive. Inode RMW cycles in the xfsaild:

xfsaild/sda4  968486.257027: xfs_buf_read:         dev 8:4 bno 0xe000d4e10 len 0x2000
xfsaild/sda4  968486.338194: xfs_buf_delwri_queue: dev 8:4 bno 0xe000d4e10 len 0x2000
xfsaild/sda4  968486.338203: xfs_buf_read:         dev 8:4 bno 0xb00ff7c00 len 0x2000
xfsaild/sda4  968486.351177: xfs_buf_delwri_queue: dev 8:4 bno 0xb00ff7c00 len 0x2000
xfsaild/sda4  968486.351183: xfs_buf_read:         dev 8:4 bno 0x280054e38 len 0x2000
xfsaild/sda4  968486.351194: xfs_buf_delwri_queue: dev 8:4 bno 0x280054e38 len 0x2000
xfsaild/sda4  968486.351200: xfs_buf_read:         dev 8:4 bno 0x580057ee8 len 0x2000
xfsaild/sda4  968486.363347: xfs_buf_delwri_queue: dev 8:4 bno 0x580057ee8 len 0x2000
xfsaild/sda4  968486.363355: xfs_buf_read:         dev 8:4 bno 0x500522980 len 0x2000
xfsaild/sda4  968486.373812: xfs_buf_delwri_queue: dev 8:4 bno 0x500522980 len 0x2000
xfsaild/sda4  968486.373817: xfs_buf_read:         dev 8:4 bno 0x800412390 len 0x2000
xfsaild/sda4  968486.373829: xfs_buf_delwri_queue: dev 8:4 bno 0x800412390 len 0x2000
xfsaild/sda4  968486.373835: xfs_buf_read:         dev 8:4 bno 0xe005a07c0 len 0x2000
xfsaild/sda4  968486.386211: xfs_buf_delwri_queue: dev 8:4 bno 0xe005a07c0 len 0x2000
xfsaild/sda4  968486.386220: xfs_buf_read:         dev 8:4 bno 0x5801af5f8 len 0x2000
xfsaild/sda4  968486.400109: xfs_buf_delwri_queue: dev 8:4 bno 0x5801af5f8 len 0x2000
xfsaild/sda4  968486.400116: xfs_buf_read:         dev 8:4 bno 0xf01026940 len 0x2000
xfsaild/sda4  968486.400128: xfs_buf_delwri_queue: dev 8:4 bno 0xf01026940 len 0x2000
xfsaild/sda4  968486.400135: xfs_buf_read:         dev 8:4 bno 0xe00fccac0 len 0x2000
xfsaild/sda4  968486.517162: xfs_buf_delwri_queue: dev 8:4 bno 0xe00fccac0 len 0x2000
xfsaild/sda4  968486.517169: xfs_buf_read:         dev 8:4 bno 0x4007ba2d0 len 0x2000
xfsaild/sda4  968486.517179: xfs_buf_delwri_queue: dev 8:4 bno 0x4007ba2d0 len 0x2000
xfsaild/sda4  968486.517187: xfs_buf_read:         dev 8:4 bno 0x8800652c8 len 0x2000
xfsaild/sda4  968486.524118: xfs_buf_delwri_queue: dev 8:4 bno 0x8800652c8 len 0x2000
xfsaild/sda4  968486.524126: xfs_buf_read:         dev 8:4 bno 0x2811e0dc8 len 0x2000
xfsaild/sda4  968486.536576: xfs_buf_delwri_queue: dev 8:4 bno 0x2811e0dc8 len 0x2000
.....
xfsaild/sda4  968516.199683: xfs_buf_read:         dev 8:4 bno 0x7008ebfb0 len 0x2000
xfsaild/sda4  968516.212424: xfs_buf_delwri_queue: dev 8:4 bno 0x7008ebfb0 len 0x2000


Every buffer read is followed by a queuing for write.  It is also
showing that it is typically taking 10-25ms per inode read IO, which
is exactly what I'd expect for your given storage. There are 2500 of
these over the 30s period, which translates to about one every 12ms
across the 30s sample.

So, yes, your hangs are definitely due to inode buffer RMW cycles
when trying to flush dirty inodes from the cache. I have a few
ideas on how to fix this - but I'm not sure whether a current TOT
solution will be easily back-portable. The simplest solution is a
readahead based solution - AIL pushing is async, and will cycle back
to inodes that it failed to flush the first time past, so triggering
readahead on the first pass might work just fine.

Right now we do:

	read inode buffer (trylock)
	no buffer:
		read from disk
		wait
	flush inode to buffer
	queue buffer for write

So the aild is essentially blocking on inode buffer read IO. What
would be better is:


	read inode buffer (trylock)
	no buffer:
		issue readahead to disk
		fail read
	xfsaild skips inode
	.....
	read inode buffer (trylock)
	buffer found
	flush inode to buffer
	queue buffer for write

That way the xfsaild will make a pass across the AIL doing
readahead and doesn't block on RMW cycles. Effectively we get async
RMW cycles occurring, and the latency of a single cycle will no
longer be the performance limiting factor. I'll start to prototype
something to address this - it isn't a new idea, and I've seen it
done before, so i should be able to get something working.

> Anything else?

No, I know what the problem is now.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-23 23:27             ` Dave Chinner
@ 2012-04-24 15:10               ` Josef 'Jeff' Sipek
  2012-09-27 12:49               ` Josef 'Jeff' Sipek
  1 sibling, 0 replies; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-04-24 15:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Apr 24, 2012 at 09:27:59AM +1000, Dave Chinner wrote:
> On Mon, Apr 23, 2012 at 04:24:41PM -0400, Josef 'Jeff' Sipek wrote:
> > On Sat, Apr 21, 2012 at 10:29:32AM +1000, Dave Chinner wrote:
> > ...
> > > but also given the length of the incident, some other data is definitely
> > > needed:
> > > 	- a 30s event trace - it'll compress pretty well
> > > 	  (trace-cmd record -e xfs* sleep 30; trace-cmd report > output.txt)
> > 
> > http://31bits.net/download/output-1335211829.txt.bz2
> 
> Ok, that's instructive. Inode RMW cycles in the xfsaild:
...
> Every buffer read is followed by a queuing for write.  It is also
> showing that it is typically taking 10-25ms per inode read IO, which
> is exactly what I'd expect for your given storage. There are 2500 of
> these over the 30s period, which translates to about one every 12ms
> across the 30s sample.
> 
> So, yes, your hangs are definitely due to inode buffer RMW cycles
> when trying to flush dirty inodes from the cache. I have a few
> ideas on how to fix this - but I'm not sure whether a current TOT
> solution will be easily back-portable.

If it is too much effort to backport, we should be able to move the box to a
3.3 stable kernel (assuming no driver problems).

> The simplest solution is a readahead based solution - AIL pushing is
> async, and will cycle back to inodes that it failed to flush the first
> time past, so triggering readahead on the first pass might work just fine.
...

This makes sense.  With a large enough log, could you not end up evicting
the readahead inodes by the time you get back to them?

> That way the xfsaild will make a pass across the AIL doing
> readahead and doesn't block on RMW cycles. Effectively we get async
> RMW cycles occurring, and the latency of a single cycle will no
> longer be the performance limiting factor. I'll start to prototype
> something to address this - it isn't a new idea, and I've seen it
> done before, so i should be able to get something working.

Cool.  Let me know when you have something we can try.  I don't quite know
what it is that's causing this giant backlog of inode modifications - I
suspect it's the rsync that's pushing it over.  But regardless, I'm
interested in testing the fix.

Thanks!

Jeff.

-- 
Fact: 30.3% of all statistics are generated randomly.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-04-23 23:27             ` Dave Chinner
  2012-04-24 15:10               ` Josef 'Jeff' Sipek
@ 2012-09-27 12:49               ` Josef 'Jeff' Sipek
  2012-09-27 22:50                 ` Dave Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Josef 'Jeff' Sipek @ 2012-09-27 12:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Apr 24, 2012 at 09:27:59AM +1000, Dave Chinner wrote:
...
> So, yes, your hangs are definitely due to inode buffer RMW cycles
> when trying to flush dirty inodes from the cache. I have a few
> ideas on how to fix this - but I'm not sure whether a current TOT
> solution will be easily back-portable. The simplest solution is a
> readahead based solution - AIL pushing is async, and will cycle back
> to inodes that it failed to flush the first time past, so triggering
> readahead on the first pass might work just fine.

Have you had time to look at this?  (We got bitten by this again and I
really don't want to go back to ext4.)  Is there anything I can help with?

Jeff.

-- 
Evolution, n.:
  A hypothetical process whereby infinitely improbable events occur with
  alarming frequency, order arises from chaos, and no one is given credit.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2012-09-27 12:49               ` Josef 'Jeff' Sipek
@ 2012-09-27 22:50                 ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2012-09-27 22:50 UTC (permalink / raw)
  To: Josef 'Jeff' Sipek; +Cc: xfs

On Thu, Sep 27, 2012 at 08:49:15AM -0400, Josef 'Jeff' Sipek wrote:
> On Tue, Apr 24, 2012 at 09:27:59AM +1000, Dave Chinner wrote:
> ...
> > So, yes, your hangs are definitely due to inode buffer RMW cycles
> > when trying to flush dirty inodes from the cache. I have a few
> > ideas on how to fix this - but I'm not sure whether a current TOT
> > solution will be easily back-portable. The simplest solution is a
> > readahead based solution - AIL pushing is async, and will cycle back
> > to inodes that it failed to flush the first time past, so triggering
> > readahead on the first pass might work just fine.
> 
> Have you had time to look at this?

No.

> (We got bitten by this again and I
> really don't want to go back to ext4.)  Is there anything I can help with?

What is needed is a xfs_inode_readahead function(), which takes an
xfs inode and attempts to issue readahead on the underlying buffer
if it is not found in cache.

The simplest thing to do is add another flag to xfs_buf_read that is
passed through from xfs_iflush() to say "don't block on read".
Indeed, we used to have a XBF_DONT_BLOCK flag that we removed
recently becuse it was the default behaviour for everything. That
could be reintroduced to tell xfs_buf_read_map() that it should
return the bufer if it is in cache, or issue readahead and return
null if it is not found in memory so that a read a little while
later might find it. COmbined iwth the XBF_TRYLOCK that xfs_iflush
alrady uses, it won't block on read IO already in progress, either.
i.e. there's a difference between not-in-cache and
in-cache-but-locked when it comes to issuing readahead, so there
might need to be slight changes to xfs_buf_find() to accomodate
that.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2010-11-04 17:32   ` Peter Zijlstra
  2010-11-04 18:16     ` Sergey Senozhatsky
@ 2010-11-05 11:14     ` Sergey Senozhatsky
  1 sibling, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2010-11-05 11:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Markus Trippelsdorf, Sergey Senozhatsky, Oleg Nesterov,
	KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel,
	Marcelo Tosatti, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On (11/04/10 18:32), Peter Zijlstra wrote:
> Subject: sched: Fixup cross sched_class wakeup preemption
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Sun Oct 31 12:37:04 CET 2010
> 
> Instead of dealing with sched classes inside each check_preempt_curr()
> implementation, pull out this logic into the generic wakeup preemption
> path.
> 
> This fixes a hang in KVM (and others) where we are waiting for the
> stop machine thread to run..
> 

Hello,

Peter, patch seems to work.


Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>


	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2010-11-04 17:32   ` Peter Zijlstra
@ 2010-11-04 18:16     ` Sergey Senozhatsky
  2010-11-05 11:14     ` Sergey Senozhatsky
  1 sibling, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2010-11-04 18:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Markus Trippelsdorf, Sergey Senozhatsky, Oleg Nesterov,
	KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel,
	Marcelo Tosatti, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 5494 bytes --]

On (11/04/10 18:32), Peter Zijlstra wrote:
> On Thu, 2010-11-04 at 17:19 +0100, Markus Trippelsdorf wrote:
> > On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote:
> > > Hello,
> > > Got the following traces:
> > > 
> > > [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds.
> > > [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [42001.449311] openbox       D 0000000000000003     0 17761   3723 0x00000000
> > > [42001.449321]  ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000
> > > [42001.449333]  ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8
> > > [42001.449343]  0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040
> > > [42001.449354] Call Trace:
> > > [42001.449369]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
> > > [42001.449381]  [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
> > > [42001.449389]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
> > > [42001.449398]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
> > > [42001.449406]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
> > > [42001.449413]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
> > > [42001.449421]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
> > > [42001.449429]  [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
> > > [42001.449436]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
> > > [42001.449445]  [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
> > 
> > These traces look related to:
> > http://article.gmane.org/gmane.linux.kernel/1055100
> > 
> > Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your
> > case?
> 
>

Hello,
I'll test "sched: Fixup cross sched_class wakeup preemption" first
and then (if it fails) I'll test 34f971f6f7988be4d014eec3e3526bee6d007ffa 
revert.


	Sergey
 
> ---
> Subject: sched: Fixup cross sched_class wakeup preemption
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Sun Oct 31 12:37:04 CET 2010
> 
> Instead of dealing with sched classes inside each check_preempt_curr()
> implementation, pull out this logic into the generic wakeup preemption
> path.
> 
> This fixes a hang in KVM (and others) where we are waiting for the
> stop machine thread to run..
> 
> Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> LKML-Reference: <new-submission>
> ---
>  kernel/sched.c          |   39 ++++++++++++++++++++++++++++-----------
>  kernel/sched_fair.c     |    6 ------
>  kernel/sched_stoptask.c |    2 +-
>  3 files changed, 29 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -560,18 +560,9 @@ struct rq {
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>  
> -static inline
> -void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> -{
> -	rq->curr->sched_class->check_preempt_curr(rq, p, flags);
>  
> -	/*
> -	 * A queue event has occurred, and we're going to schedule.  In
> -	 * this case, we can save a useless back to back clock update.
> -	 */
> -	if (test_tsk_need_resched(p))
> -		rq->skip_clock_update = 1;
> -}
> +static inline
> +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
>  
>  static inline int cpu_of(struct rq *rq)
>  {
> @@ -9400,4 +9391,30 @@ void synchronize_sched_expedited(void)
>  }
>  EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
>  
> +static inline
> +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> +{
> +	const struct sched_class *class;
> +
> +	if (p->sched_class == rq->curr->sched_class)
> +		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
> +
> +	for_each_class(class) {
> +		if (class == rq->curr->sched_class)
> +			break;
> +		if (class == p->sched_class) {
> +			resched_task(rq->curr);
> +			break;
> +		}
> +	}
> +
> +	/*
> +	 * A queue event has occurred, and we're going to schedule.  In
> +	 * this case, we can save a useless back to back clock update.
> +	 */
> +	if (test_tsk_need_resched(rq->curr))
> +		rq->skip_clock_update = 1;
> +}
> +
> +
>  #endif /* #else #ifndef CONFIG_SMP */
> Index: linux-2.6/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_fair.c
> +++ linux-2.6/kernel/sched_fair.c
> @@ -1654,12 +1654,6 @@ static void check_preempt_wakeup(struct
>  	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
>  	int scale = cfs_rq->nr_running >= sched_nr_latency;
>  
> -	if (unlikely(rt_prio(p->prio)))
> -		goto preempt;
> -
> -	if (unlikely(p->sched_class != &fair_sched_class))
> -		return;
> -
>  	if (unlikely(se == pse))
>  		return;
>  
> Index: linux-2.6/kernel/sched_stoptask.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_stoptask.c
> +++ linux-2.6/kernel/sched_stoptask.c
> @@ -19,7 +19,7 @@ select_task_rq_stop(struct rq *rq, struc
>  static void
>  check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
>  {
> -	resched_task(rq->curr); /* we preempt everything */
> +	/* we're never preempted */
>  }
>  
>  static struct task_struct *pick_next_task_stop(struct rq *rq)
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2010-11-04 16:19 ` Markus Trippelsdorf
@ 2010-11-04 17:32   ` Peter Zijlstra
  2010-11-04 18:16     ` Sergey Senozhatsky
  2010-11-05 11:14     ` Sergey Senozhatsky
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2010-11-04 17:32 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Sergey Senozhatsky, Oleg Nesterov, KOSAKI Motohiro,
	Roland McGrath, David Howells, linux-kernel, Marcelo Tosatti,
	Ingo Molnar, Thomas Gleixner

On Thu, 2010-11-04 at 17:19 +0100, Markus Trippelsdorf wrote:
> On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote:
> > Hello,
> > Got the following traces:
> > 
> > [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds.
> > [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [42001.449311] openbox       D 0000000000000003     0 17761   3723 0x00000000
> > [42001.449321]  ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000
> > [42001.449333]  ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8
> > [42001.449343]  0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040
> > [42001.449354] Call Trace:
> > [42001.449369]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
> > [42001.449381]  [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
> > [42001.449389]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
> > [42001.449398]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
> > [42001.449406]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
> > [42001.449413]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
> > [42001.449421]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
> > [42001.449429]  [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
> > [42001.449436]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
> > [42001.449445]  [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
> 
> These traces look related to:
> http://article.gmane.org/gmane.linux.kernel/1055100
> 
> Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your
> case?


---
Subject: sched: Fixup cross sched_class wakeup preemption
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Sun Oct 31 12:37:04 CET 2010

Instead of dealing with sched classes inside each check_preempt_curr()
implementation, pull out this logic into the generic wakeup preemption
path.

This fixes a hang in KVM (and others) where we are waiting for the
stop machine thread to run..

Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
---
 kernel/sched.c          |   39 ++++++++++++++++++++++++++++-----------
 kernel/sched_fair.c     |    6 ------
 kernel/sched_stoptask.c |    2 +-
 3 files changed, 29 insertions(+), 18 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -560,18 +560,9 @@ struct rq {
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
-static inline
-void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
-{
-	rq->curr->sched_class->check_preempt_curr(rq, p, flags);
 
-	/*
-	 * A queue event has occurred, and we're going to schedule.  In
-	 * this case, we can save a useless back to back clock update.
-	 */
-	if (test_tsk_need_resched(p))
-		rq->skip_clock_update = 1;
-}
+static inline
+void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
 
 static inline int cpu_of(struct rq *rq)
 {
@@ -9400,4 +9391,30 @@ void synchronize_sched_expedited(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
+static inline
+void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
+{
+	const struct sched_class *class;
+
+	if (p->sched_class == rq->curr->sched_class)
+		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
+
+	for_each_class(class) {
+		if (class == rq->curr->sched_class)
+			break;
+		if (class == p->sched_class) {
+			resched_task(rq->curr);
+			break;
+		}
+	}
+
+	/*
+	 * A queue event has occurred, and we're going to schedule.  In
+	 * this case, we can save a useless back to back clock update.
+	 */
+	if (test_tsk_need_resched(rq->curr))
+		rq->skip_clock_update = 1;
+}
+
+
 #endif /* #else #ifndef CONFIG_SMP */
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1654,12 +1654,6 @@ static void check_preempt_wakeup(struct
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
 	int scale = cfs_rq->nr_running >= sched_nr_latency;
 
-	if (unlikely(rt_prio(p->prio)))
-		goto preempt;
-
-	if (unlikely(p->sched_class != &fair_sched_class))
-		return;
-
 	if (unlikely(se == pse))
 		return;
 
Index: linux-2.6/kernel/sched_stoptask.c
===================================================================
--- linux-2.6.orig/kernel/sched_stoptask.c
+++ linux-2.6/kernel/sched_stoptask.c
@@ -19,7 +19,7 @@ select_task_rq_stop(struct rq *rq, struc
 static void
 check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
 {
-	resched_task(rq->curr); /* we preempt everything */
+	/* we're never preempted */
 }
 
 static struct task_struct *pick_next_task_stop(struct rq *rq)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: task blocked for more than 120 seconds
  2010-11-04 15:58 Sergey Senozhatsky
@ 2010-11-04 16:19 ` Markus Trippelsdorf
  2010-11-04 17:32   ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Trippelsdorf @ 2010-11-04 16:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Oleg Nesterov, KOSAKI Motohiro, Roland McGrath, David Howells,
	linux-kernel, Marcelo Tosatti, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner

On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote:
> Hello,
> Got the following traces:
> 
> [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds.
> [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [42001.449311] openbox       D 0000000000000003     0 17761   3723 0x00000000
> [42001.449321]  ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000
> [42001.449333]  ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8
> [42001.449343]  0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040
> [42001.449354] Call Trace:
> [42001.449369]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
> [42001.449381]  [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
> [42001.449389]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
> [42001.449398]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
> [42001.449406]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
> [42001.449413]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
> [42001.449421]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
> [42001.449429]  [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
> [42001.449436]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
> [42001.449445]  [<ffffffff81084808>] stop_one_cpu+0x8c/0xba

These traces look related to:
http://article.gmane.org/gmane.linux.kernel/1055100

Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your
case?

-- 
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* task blocked for more than 120 seconds
@ 2010-11-04 15:58 Sergey Senozhatsky
  2010-11-04 16:19 ` Markus Trippelsdorf
  0 siblings, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2010-11-04 15:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]

Hello,
Got the following traces:

[42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds.
[42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42001.449311] openbox       D 0000000000000003     0 17761   3723 0x00000000
[42001.449321]  ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000
[42001.449333]  ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8
[42001.449343]  0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040
[42001.449354] Call Trace:
[42001.449369]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[42001.449381]  [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
[42001.449389]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[42001.449398]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[42001.449406]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[42001.449413]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
[42001.449421]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[42001.449429]  [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
[42001.449436]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[42001.449445]  [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
[42001.449453]  [<ffffffff810355cb>] ? migration_cpu_stop+0x0/0x3d
[42001.449461]  [<ffffffff81035aef>] sched_exec+0xc3/0xdc
[42001.449470]  [<ffffffff810e7dad>] do_execve+0xaa/0x267
[42001.449479]  [<ffffffff8100a58b>] sys_execve+0x3e/0x55
[42001.449488]  [<ffffffff8100265c>] stub_execve+0x6c/0xc0
[42001.449494] 1 lock held by openbox/17761:
[42001.449498]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff810e7b1d>] prepare_bprm_creds+0x28/0x64

[47761.448699] INFO: task make:29123 blocked for more than 120 seconds.
[47761.448704] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47761.448710] make          D 0000000000000001     0 29123  24208 0x00000004
[47761.448719]  ffff880115dadcf0 0000000000000046 ffff880100000000 ffff880115dac000
[47761.448731]  ffff880115dadfd8 0000000000012040 ffff880133290000 ffff880115dadfd8
[47761.448741]  0000000000012040 0000000000012040 ffff880115dadfd8 0000000000012040
[47761.448752] Call Trace:
[47761.448766]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[47761.448774]  [<ffffffff814204da>] ? wait_for_common+0x26/0x144
[47761.448784]  [<ffffffff81207b53>] ? do_raw_spin_lock+0x6b/0x122
[47761.448792]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[47761.448801]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[47761.448808]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448816]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
[47761.448823]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[47761.448831]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448838]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[47761.448845]  [<ffffffff8103cec0>] do_fork+0x2b0/0x32e
[47761.448856]  [<ffffffff810021ba>] ? sysret_check+0x2e/0x69
[47761.448864]  [<ffffffff8100a526>] sys_vfork+0x20/0x22
[47761.448871]  [<ffffffff81002563>] stub_vfork+0x13/0x20
[47761.448878]  [<ffffffff81002182>] ? system_call_fastpath+0x16/0x1b
[47761.448884] no locks held by make/29123.
[47761.448891] INFO: task make:31380 blocked for more than 120 seconds.
[47761.448895] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47761.448899] make          D 0000000000000000     0 31380  29123 0x00000000
[47761.448908]  ffff880157f01c38 0000000000000046 0000000000000000 ffff880157f00000
[47761.448918]  ffff880157f01fd8 0000000000012040 ffff8800773abea0 ffff880157f01fd8
[47761.448929]  0000000000012040 0000000000012040 ffff880157f01fd8 0000000000012040
[47761.448939] Call Trace:
[47761.448947]  [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[47761.448957]  [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
[47761.448965]  [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[47761.448972]  [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[47761.448979]  [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448986]  [<ffffffff8142057e>] wait_for_common+0xca/0x144
[47761.448994]  [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[47761.449001]  [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
[47761.449008]  [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[47761.449017]  [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
[47761.449025]  [<ffffffff810355cb>] ? migration_cpu_stop+0x0/0x3d
[47761.449033]  [<ffffffff81035aef>] sched_exec+0xc3/0xdc
[47761.449041]  [<ffffffff810e7dad>] do_execve+0xaa/0x267
[47761.449049]  [<ffffffff8100a58b>] sys_execve+0x3e/0x55
[47761.449056]  [<ffffffff8100265c>] stub_execve+0x6c/0xc0
[47761.449062] 1 lock held by make/31380:
[47761.449065]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff810e7b1d>] prepare_bprm_creds+0x28/0x64



	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* task blocked for more than 120 seconds.
@ 2010-10-07 23:18 Shawn Bohrer
  0 siblings, 0 replies; 21+ messages in thread
From: Shawn Bohrer @ 2010-10-07 23:18 UTC (permalink / raw)
  To: linux-nfs; +Cc: linux-kernel

Hello,

I've got some machines with nfs mounted home directories and recently
had the machines lock up with the following output below from the
kernel logs.  This machine was running 2.6.32.21 at the time, and
was locked up for at least 20 minutes before we rebooted.  We've had
this happen to us twice now, so while I haven't tried I believe we can
reproduce it in about a day.

Does anyone have any insight on what may be happening here or any
suggestions for debugging?

Thanks,
Shawn

INFO: task java:24970 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88189c50e198     0 24970  24239 0x00000000
 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c
 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797
 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90
 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task java:24982 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff8800283ee198     0 24982  24239 0x00000000
 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10
 0000000000000286 0000000000000030 0000000000000282 000000010174a681
 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task hbitimestamp:23184 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hbitimestamp  D ffff88109c6ae198     0 23184  23175 0x00000000
 ffff880855d179c8 0000000000000086 0000000000000000 000000005bb2d600
 0000000000000282 0000000000000030 0000000000000282 00000001017445ab
 ffff8808577bc640 ffff880855d17fd8 000000000000e198 ffff8808577bc640
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8100cb4e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task java:24970 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88189c50e198     0 24970  24239 0x00000000
 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c
 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797
 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90
 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task java:24982 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff8800283ee198     0 24982  24239 0x00000000
 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10
 0000000000000286 0000000000000030 0000000000000282 000000010174a681
 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task tail:22115 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tail          D ffff88189c4ee198     0 22115  14494 0x00000000
 ffff88085b95bbd8 0000000000000086 0000000000000000 ffffffffa035d896
 ffff88205be57cc8 ffff88205be57cc8 ffff88205be57cc8 000000010174d70e
 ffff8808564fe400 ffff88085b95bfd8 000000000000e198 ffff8808564fe400
Call Trace:
 [<ffffffffa035d896>] ? __rpc_execute+0xd6/0x2a0 [sunrpc]
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa04037d2>] nfs_sync_mapping_wait+0x122/0x260 [nfs]
 [<ffffffffa0403aa9>] nfs_write_mapping+0x79/0xb0 [nfs]
 [<ffffffffa0403afa>] nfs_wb_nocommit+0x1a/0x20 [nfs]
 [<ffffffffa03f60f8>] nfs_getattr+0x128/0x140 [nfs]
 [<ffffffff811351d1>] vfs_getattr+0x51/0x80
 [<ffffffff8113548f>] vfs_fstat+0x3f/0x60
 [<ffffffff811354d4>] sys_newfstat+0x24/0x40
 [<ffffffff810841a4>] ? sys_nanosleep+0x74/0x80
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task hbitimestamp:23184 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hbitimestamp  D ffff88109c6ae198     0 23184  23175 0x00000000
 ffff880855d179c8 0000000000000086 0000000000000000 000000005bb2d600
 0000000000000282 0000000000000030 0000000000000282 00000001017445ab
 ffff8808577bc640 ffff880855d17fd8 000000000000e198 ffff8808577bc640
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8100cb4e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task java:24970 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88189c50e198     0 24970  24239 0x00000000
 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c
 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797
 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90
 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task java:24982 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff8800283ee198     0 24982  24239 0x00000000
 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10
 0000000000000286 0000000000000030 0000000000000282 000000010174a681
 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580
Call Trace:
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff81419df3>] io_schedule+0x73/0xc0
 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
 [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90
 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
 [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40
 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0
 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs]
 [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs]
 [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs]
 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470
 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b
INFO: task hbitimestamp:24427 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hbitimestamp  D ffff88089c40e198     0 24427  24317 0x00000000
 ffff88205a0b3c88 0000000000000082 0000000000000000 0000000000000000
 ffff88205a0b3ca8 ffffffff814196a8 0000000000000000 00000001017777de
 ffff88205a6ec8c0 ffff88205a0b3fd8 000000000000e198 ffff88205a6ec8c0
Call Trace:
 [<ffffffff814196a8>] ? thread_return+0x4e/0x726
 [<ffffffff8141aac1>] __mutex_lock_slowpath+0xf1/0x170
 [<ffffffff8141a9ab>] mutex_lock+0x2b/0x50
 [<ffffffff810e2879>] generic_file_aio_write+0x59/0xe0
 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs]
 [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140
 [<ffffffff810fcb91>] ? __do_fault+0x3e1/0x4c0
 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8116a68f>] ? inotify_inode_queue_event+0x2f/0x120
 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20
 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0
 [<ffffffff81130711>] sys_write+0x51/0x90
 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Task blocked for more than 120 seconds
  2008-03-24 10:21 Task " Samuel Tardieu
@ 2008-03-25  3:50 ` Neil Brown
  0 siblings, 0 replies; 21+ messages in thread
From: Neil Brown @ 2008-03-25  3:50 UTC (permalink / raw)
  To: Samuel Tardieu; +Cc: linux-raid, dm-devel


Wrong mailing list.
copying to dm-devel@redhat.com

NeilBrown

On Monday March 24, sam@rfc1149.net wrote:
> On my laptop, doing heavy C++ compilations in parallel with -j3 (this
> is a dual core) often generates the following trace:
> 
> INFO: task g++:25119 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> g++           D c03916c0     0 25119  25118
>        ecfef200 00200086 c038ef18 c03916c0 c03916c0 f77b2230 f77b2378 c17fa6c0 
>        00000000 f89817bc f7486280 000000ff f7a1eb20 00000000 00000000 00000000 
>        c17fa6c0 00000000 d1401e9c c17f0408 c02a2aa1 d1401e94 c0147100 c02a2c93 
> Call Trace:
>  [<f89817bc>] dm_table_unplug_all+0x1e/0x2e [dm_mod]
>  [<c02a2aa1>] io_schedule+0x1b/0x24
>  [<c0147100>] sync_page+0x33/0x36
>  [<c02a2c93>] __wait_on_bit+0x33/0x58
>  [<c01470cd>] sync_page+0x0/0x36
>  [<c01472fd>] wait_on_page_bit+0x59/0x60
>  [<c012cf7b>] wake_bit_function+0x0/0x3c
>  [<c014e20d>] truncate_inode_pages_range+0x238/0x29f
>  [<c014e27d>] truncate_inode_pages+0x9/0xc
>  [<f8c67187>] ext2_delete_inode+0x12/0x6e [ext2]
>  [<f8c67175>] ext2_delete_inode+0x0/0x6e [ext2]
>  [<c0170ea5>] generic_delete_inode+0x8f/0xf3
>  [<c0170819>] iput+0x60/0x62
>  [<c0169aa5>] do_unlinkat+0xb7/0xf9
>  [<c0113a3e>] do_page_fault+0x1fa/0x4dc
>  [<c0104822>] sysenter_past_esp+0x5f/0x85
>  =======================
> 
> This is with 2.6.25-rc6 (SMP) and has been present, as far as I can
> remember, since the beginning of the 2.6.25-rc series. It is not
> always reproducible, but the trace is always the same.
> 
> My filesystem is stored on a ext3 (rw,noatime) dm_crypt'd partition
> leaving in a LVM volume.
> 
> % lsmod | grep dm | grep -v ' 0 *$'
> dm_crypt               14340  1 
> crypto_blkcipher       18308  6 ecb,cbc,dm_crypt
> dm_mod                 53008  26 dm_crypt,dm_mirror,dm_snapshot
> 
> Here is the code I have in dm-table.o:
> 
> 00001042 <dm_table_unplug_all>:
>     1042:       56                      push   %esi
>     1043:       53                      push   %ebx
>     1044:       8b 98 a0 00 00 00       mov    0xa0(%eax),%ebx
>     104a:       8d b0 a0 00 00 00       lea    0xa0(%eax),%esi
>     1050:       eb 10                   jmp    1062 <dm_table_unplug_all+0x20>
>     1052:       8b 43 10                mov    0x10(%ebx),%eax
>     1055:       8b 40 5c                mov    0x5c(%eax),%eax
>     1058:       8b 40 34                mov    0x34(%eax),%eax
>     105b:       e8 fc ff ff ff          call   105c <dm_table_unplug_all+0x1a>
>     1060:       8b 1b                   mov    (%ebx),%ebx
>     1062:       8b 03                   mov    (%ebx),%eax
>     1064:       0f 1f 40 00             nopl   0x0(%eax)
>     1068:       39 f3                   cmp    %esi,%ebx
>     106a:       75 e6                   jne    1052 <dm_table_unplug_all+0x10>
>     106c:       5b                      pop    %ebx
>     106d:       5e                      pop    %esi
>     106e:       c3                      ret    
> 
> The symbol in 105b call is, after relocation, blk_unplug.
> 
> It there anything else I can do to help debugging this?
> 
>  Sam
> -- 
> Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Task blocked for more than 120 seconds
@ 2008-03-24 10:21 Samuel Tardieu
  2008-03-25  3:50 ` Neil Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Samuel Tardieu @ 2008-03-24 10:21 UTC (permalink / raw)
  To: linux-raid

On my laptop, doing heavy C++ compilations in parallel with -j3 (this
is a dual core) often generates the following trace:

INFO: task g++:25119 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
g++           D c03916c0     0 25119  25118
       ecfef200 00200086 c038ef18 c03916c0 c03916c0 f77b2230 f77b2378 c17fa6c0 
       00000000 f89817bc f7486280 000000ff f7a1eb20 00000000 00000000 00000000 
       c17fa6c0 00000000 d1401e9c c17f0408 c02a2aa1 d1401e94 c0147100 c02a2c93 
Call Trace:
 [<f89817bc>] dm_table_unplug_all+0x1e/0x2e [dm_mod]
 [<c02a2aa1>] io_schedule+0x1b/0x24
 [<c0147100>] sync_page+0x33/0x36
 [<c02a2c93>] __wait_on_bit+0x33/0x58
 [<c01470cd>] sync_page+0x0/0x36
 [<c01472fd>] wait_on_page_bit+0x59/0x60
 [<c012cf7b>] wake_bit_function+0x0/0x3c
 [<c014e20d>] truncate_inode_pages_range+0x238/0x29f
 [<c014e27d>] truncate_inode_pages+0x9/0xc
 [<f8c67187>] ext2_delete_inode+0x12/0x6e [ext2]
 [<f8c67175>] ext2_delete_inode+0x0/0x6e [ext2]
 [<c0170ea5>] generic_delete_inode+0x8f/0xf3
 [<c0170819>] iput+0x60/0x62
 [<c0169aa5>] do_unlinkat+0xb7/0xf9
 [<c0113a3e>] do_page_fault+0x1fa/0x4dc
 [<c0104822>] sysenter_past_esp+0x5f/0x85
 =======================

This is with 2.6.25-rc6 (SMP) and has been present, as far as I can
remember, since the beginning of the 2.6.25-rc series. It is not
always reproducible, but the trace is always the same.

My filesystem is stored on a ext3 (rw,noatime) dm_crypt'd partition
leaving in a LVM volume.

% lsmod | grep dm | grep -v ' 0 *$'
dm_crypt               14340  1 
crypto_blkcipher       18308  6 ecb,cbc,dm_crypt
dm_mod                 53008  26 dm_crypt,dm_mirror,dm_snapshot

Here is the code I have in dm-table.o:

00001042 <dm_table_unplug_all>:
    1042:       56                      push   %esi
    1043:       53                      push   %ebx
    1044:       8b 98 a0 00 00 00       mov    0xa0(%eax),%ebx
    104a:       8d b0 a0 00 00 00       lea    0xa0(%eax),%esi
    1050:       eb 10                   jmp    1062 <dm_table_unplug_all+0x20>
    1052:       8b 43 10                mov    0x10(%ebx),%eax
    1055:       8b 40 5c                mov    0x5c(%eax),%eax
    1058:       8b 40 34                mov    0x34(%eax),%eax
    105b:       e8 fc ff ff ff          call   105c <dm_table_unplug_all+0x1a>
    1060:       8b 1b                   mov    (%ebx),%ebx
    1062:       8b 03                   mov    (%ebx),%eax
    1064:       0f 1f 40 00             nopl   0x0(%eax)
    1068:       39 f3                   cmp    %esi,%ebx
    106a:       75 e6                   jne    1052 <dm_table_unplug_all+0x10>
    106c:       5b                      pop    %ebx
    106d:       5e                      pop    %esi
    106e:       c3                      ret    

The symbol in 105b call is, after relocation, blk_unplug.

It there anything else I can do to help debugging this?

 Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-09-27 22:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-18 15:11 task blocked for more than 120 seconds Josef 'Jeff' Sipek
2012-04-18 18:28 ` Ben Myers
2012-04-18 23:48 ` Dave Chinner
2012-04-19 15:46   ` Josef 'Jeff' Sipek
2012-04-19 22:56     ` Dave Chinner
2012-04-20 13:58       ` Josef 'Jeff' Sipek
2012-04-21  0:29         ` Dave Chinner
2012-04-23 17:16           ` Josef 'Jeff' Sipek
2012-04-23 20:24           ` Josef 'Jeff' Sipek
2012-04-23 23:27             ` Dave Chinner
2012-04-24 15:10               ` Josef 'Jeff' Sipek
2012-09-27 12:49               ` Josef 'Jeff' Sipek
2012-09-27 22:50                 ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2010-11-04 15:58 Sergey Senozhatsky
2010-11-04 16:19 ` Markus Trippelsdorf
2010-11-04 17:32   ` Peter Zijlstra
2010-11-04 18:16     ` Sergey Senozhatsky
2010-11-05 11:14     ` Sergey Senozhatsky
2010-10-07 23:18 Shawn Bohrer
2008-03-24 10:21 Task " Samuel Tardieu
2008-03-25  3:50 ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.