* xfs deadlock on buffer semaphore while reading directory @ 2013-02-02 19:20 Andi Kleen 2013-02-02 19:54 ` Mark Tinguely 0 siblings, 1 reply; 9+ messages in thread From: Andi Kleen @ 2013-02-02 19:20 UTC (permalink / raw) To: xfs On a older 3.6-rc9 opensuse kernel I had the following deadlock with a "find" running on a USB hard disk. I first thought it was some IO request getting stuck from another process (USB can be flakey), but after looking through sysrq-t there is nothing else active in XFS. So looks like some kind of XFS race or deadlock? -Andi [233265.161582] find D ffff88042a8e4820 0 17774 1 0x00000004 [233265.161586] ffff880212ad36c8 0000000000000086 0000000000000000 ffff880212ad3fd8 [233265.161590] ffff880212ad3fd8 000000000000a000 ffff88042d9596c0 ffff88042a8e4440 [233265.161593] 00000000000128c0 ffff88042a192a00 ffff88042a192a00 0000000000000000 [233265.161596] Call Trace: [233265.161606] [<ffffffff81531eec>] ? __schedule+0x3fc/0x8c0 [233265.161610] [<ffffffff8110a1c0>] ? get_page_from_freelist+0x170/0x310 [233265.161628] [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs] [233265.161632] [<ffffffff81532469>] schedule+0x29/0x70 [233265.161635] [<ffffffff815308c5>] schedule_timeout+0x1d5/0x230 [233265.161638] [<ffffffff8110a9d5>] ? __alloc_pages_nodemask+0xe5/0x7e0 [233265.161650] [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs] [233265.161654] [<ffffffff815314c1>] __down+0x6a/0x97 [233265.161657] [<ffffffff81065ee1>] down+0x41/0x50 [233265.161669] [<ffffffffa0620634>] xfs_buf_lock+0x44/0x110 [xfs] [233265.161680] [<ffffffffa06207e3>] _xfs_buf_find+0xe3/0x240 [xfs] [233265.161692] [<ffffffffa0620ba1>] xfs_buf_get_map+0x171/0x1b0 [xfs] [233265.161703] [<ffffffffa06218fd>] xfs_buf_read_map+0x2d/0x110 [xfs] [233265.161739] [<ffffffffa06550c9>] ? xfs_dabuf_map.isra.2+0x239/0x250 [xfs] [233265.161759] [<ffffffffa067c3f5>] xfs_trans_read_buf_map+0x265/0x480 [xfs] [233265.161776] [<ffffffffa0656486>] xfs_da_read_buf+0xc6/0x1f0 [xfs] [233265.161787] [<ffffffffa06202de>] ? xfs_buf_rele+0x4e/0x130 [xfs] [233265.161799] [<ffffffffa062096d>] ? xfs_buf_unlock+0x2d/0xa0 [xfs] [233265.161814] [<ffffffffa0657893>] xfs_da_node_lookup_int+0xa3/0x2c0 [xfs] [233265.161830] [<ffffffffa0660161>] xfs_dir2_node_lookup+0x51/0x170 [xfs] [233265.161845] [<ffffffffa0658b86>] ? xfs_dir2_isleaf+0x26/0x60 [xfs] [233265.161860] [<ffffffffa065911d>] xfs_dir_lookup+0x15d/0x170 [xfs] [233265.161874] [<ffffffffa063488f>] xfs_lookup+0xcf/0x130 [xfs] [233265.161887] [<ffffffffa062b9d1>] xfs_vn_lookup+0x51/0x90 [xfs] [233265.161891] [<ffffffff81159a8b>] ? lookup_dcache+0xab/0xd0 [233265.161894] [<ffffffff8115952d>] lookup_real+0x1d/0x60 [233265.161898] [<ffffffff81159ae8>] __lookup_hash+0x38/0x50 [233265.161901] [<ffffffff8115a7ee>] lookup_slow+0x4e/0xc0 [233265.161904] [<ffffffff8115c7ef>] path_lookupat+0x73f/0x790 [233265.161908] [<ffffffff8115c871>] do_path_lookup+0x31/0xc0 [233265.161911] [<ffffffff8115eb69>] user_path_at_empty+0x59/0xa0 [233265.161915] [<ffffffff8116c879>] ? mntput_no_expire+0x49/0x160 [233265.161918] [<ffffffff811535d7>] ? cp_new_stat+0x107/0x120 [233265.161921] [<ffffffff8115ebc1>] user_path_at+0x11/0x20 [233265.161924] [<ffffffff811537ba>] vfs_fstatat+0x3a/0x70 [233265.161927] [<ffffffff811539ea>] sys_newfstatat+0x1a/0x40 [233265.161930] [<ffffffff8153b712>] system_call_fastpath+0x16/0x1b -- ak@linux.intel.com -- Speaking for myself only. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-02 19:20 xfs deadlock on buffer semaphore while reading directory Andi Kleen @ 2013-02-02 19:54 ` Mark Tinguely 2013-02-02 20:46 ` Andi Kleen 2013-02-03 6:35 ` Linda Walsh 0 siblings, 2 replies; 9+ messages in thread From: Mark Tinguely @ 2013-02-02 19:54 UTC (permalink / raw) To: Andi Kleen; +Cc: xfs On 02/02/13 13:20, Andi Kleen wrote: > > On a older 3.6-rc9 opensuse kernel I had the following deadlock with > a "find" running on a USB hard disk. I first thought it was some > IO request getting stuck from another process (USB can be flakey), > but after looking through sysrq-t there is nothing else active > in XFS. So looks like some kind of XFS race or deadlock? > > -Andi > > [233265.161582] find D ffff88042a8e4820 0 17774 1 > 0x00000004 > [233265.161586] ffff880212ad36c8 0000000000000086 0000000000000000 > ffff880212ad3fd8 > [233265.161590] ffff880212ad3fd8 000000000000a000 ffff88042d9596c0 > ffff88042a8e4440 > [233265.161593] 00000000000128c0 ffff88042a192a00 ffff88042a192a00 > 0000000000000000 > [233265.161596] Call Trace: > [233265.161606] [<ffffffff81531eec>] ? __schedule+0x3fc/0x8c0 > [233265.161610] [<ffffffff8110a1c0>] ? > get_page_from_freelist+0x170/0x310 > [233265.161628] [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs] > [233265.161632] [<ffffffff81532469>] schedule+0x29/0x70 > [233265.161635] [<ffffffff815308c5>] schedule_timeout+0x1d5/0x230 > [233265.161638] [<ffffffff8110a9d5>] ? > __alloc_pages_nodemask+0xe5/0x7e0 > [233265.161650] [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs] > [233265.161654] [<ffffffff815314c1>] __down+0x6a/0x97 > [233265.161657] [<ffffffff81065ee1>] down+0x41/0x50 > [233265.161669] [<ffffffffa0620634>] xfs_buf_lock+0x44/0x110 [xfs] > [233265.161680] [<ffffffffa06207e3>] _xfs_buf_find+0xe3/0x240 [xfs] > [233265.161692] [<ffffffffa0620ba1>] xfs_buf_get_map+0x171/0x1b0 [xfs] > [233265.161703] [<ffffffffa06218fd>] xfs_buf_read_map+0x2d/0x110 [xfs] > [233265.161739] [<ffffffffa06550c9>] ? xfs_dabuf_map.isra.2+0x239/0x250 > [xfs] > [233265.161759] [<ffffffffa067c3f5>] xfs_trans_read_buf_map+0x265/0x480 > [xfs] > [233265.161776] [<ffffffffa0656486>] xfs_da_read_buf+0xc6/0x1f0 [xfs] > [233265.161787] [<ffffffffa06202de>] ? xfs_buf_rele+0x4e/0x130 [xfs] > [233265.161799] [<ffffffffa062096d>] ? xfs_buf_unlock+0x2d/0xa0 [xfs] > [233265.161814] [<ffffffffa0657893>] xfs_da_node_lookup_int+0xa3/0x2c0 > [xfs] > [233265.161830] [<ffffffffa0660161>] xfs_dir2_node_lookup+0x51/0x170 > [xfs] > [233265.161845] [<ffffffffa0658b86>] ? xfs_dir2_isleaf+0x26/0x60 [xfs] > [233265.161860] [<ffffffffa065911d>] xfs_dir_lookup+0x15d/0x170 [xfs] > [233265.161874] [<ffffffffa063488f>] xfs_lookup+0xcf/0x130 [xfs] > [233265.161887] [<ffffffffa062b9d1>] xfs_vn_lookup+0x51/0x90 [xfs] > [233265.161891] [<ffffffff81159a8b>] ? lookup_dcache+0xab/0xd0 > [233265.161894] [<ffffffff8115952d>] lookup_real+0x1d/0x60 > [233265.161898] [<ffffffff81159ae8>] __lookup_hash+0x38/0x50 > [233265.161901] [<ffffffff8115a7ee>] lookup_slow+0x4e/0xc0 > [233265.161904] [<ffffffff8115c7ef>] path_lookupat+0x73f/0x790 > [233265.161908] [<ffffffff8115c871>] do_path_lookup+0x31/0xc0 > [233265.161911] [<ffffffff8115eb69>] user_path_at_empty+0x59/0xa0 > [233265.161915] [<ffffffff8116c879>] ? mntput_no_expire+0x49/0x160 > [233265.161918] [<ffffffff811535d7>] ? cp_new_stat+0x107/0x120 > [233265.161921] [<ffffffff8115ebc1>] user_path_at+0x11/0x20 > [233265.161924] [<ffffffff811537ba>] vfs_fstatat+0x3a/0x70 > [233265.161927] [<ffffffff811539ea>] sys_newfstatat+0x1a/0x40 > [233265.161930] [<ffffffff8153b712>] system_call_fastpath+0x16/0x1b > > That looks like a hang fixed by the series: http://oss.sgi.com/archives/xfs/2012-12/msg00071.html --Mark Tinguely. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-02 19:54 ` Mark Tinguely @ 2013-02-02 20:46 ` Andi Kleen 2013-02-05 23:59 ` Mark Tinguely 2013-02-03 6:35 ` Linda Walsh 1 sibling, 1 reply; 9+ messages in thread From: Andi Kleen @ 2013-02-02 20:46 UTC (permalink / raw) To: Mark Tinguely; +Cc: Andi Kleen, xfs > That looks like a hang fixed by the series: > > http://oss.sgi.com/archives/xfs/2012-12/msg00071.html Great that it is already fixed. Thanks. Is the fix considered for stable? -Andi _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-02 20:46 ` Andi Kleen @ 2013-02-05 23:59 ` Mark Tinguely 0 siblings, 0 replies; 9+ messages in thread From: Mark Tinguely @ 2013-02-05 23:59 UTC (permalink / raw) To: Andi Kleen; +Cc: xfs On 02/02/13 14:46, Andi Kleen wrote: >> That looks like a hang fixed by the series: >> >> http://oss.sgi.com/archives/xfs/2012-12/msg00071.html > > Great that it is already fixed. Thanks. > > Is the fix considered for stable? > > -Andi > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs Thanks for the reminder. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-02 19:54 ` Mark Tinguely 2013-02-02 20:46 ` Andi Kleen @ 2013-02-03 6:35 ` Linda Walsh 2013-02-03 16:42 ` Stan Hoeppner 2013-02-06 0:44 ` Dave Chinner 1 sibling, 2 replies; 9+ messages in thread From: Linda Walsh @ 2013-02-03 6:35 UTC (permalink / raw) To: xfs Mark Tinguely wrote: > On 02/02/13 13:20, Andi Kleen wrote: >> >> On a older 3.6-rc9 opensuse kernel I had the following deadlock with > > That looks like a hang fixed by the series: > > http://oss.sgi.com/archives/xfs/2012-12/msg00071.html ---- That may be so -- I used to see things like that on some older kernels with random progs but with traces down in xfs.. Seemed to be rare and not cause problems....but don't notice those any more... Odd thing about my current probs -- my current system has been up 12 days...but before that it had been up 43 days... I can't get the buffers to 'free' no matter what echo 3> /proc/sys/vm/drop_caches does nothing. so I may be rebooting soon... The large number of from that log (was from 2/1 4:30am) was that it was level 0 backup day.... _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-03 6:35 ` Linda Walsh @ 2013-02-03 16:42 ` Stan Hoeppner 2013-02-06 0:44 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Stan Hoeppner @ 2013-02-03 16:42 UTC (permalink / raw) To: Linda Walsh; +Cc: xfs On 2/3/2013 12:35 AM, Linda Walsh wrote: > Odd thing about my current probs -- my current system has been > up 12 days...but before that it had been up 43 days... Linux greer 3.2.6 #1 SMP Mon Feb 20 17:05:10 CST 2012 i686 GNU/Linux 10:13:13 up 339 days, 22:42, 1 user, load average: 0.06, 0.06, 0.05 This is a hand rolled 'minimalist' kernel with the old SLAB allocator, no modules, Debian 6 atop. Just over 11 months of flawless operation, though load is probably much lighter than on your system. > I can't get the buffers to 'free' no matter what > echo 3> /proc/sys/vm/drop_caches does nothing. That's interesting. > so I may be rebooting soon... Did you roll this 3.7.1 yourself? I'm wondering if you did something wonky in your config that's causing or contributing to your problems. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-03 6:35 ` Linda Walsh 2013-02-03 16:42 ` Stan Hoeppner @ 2013-02-06 0:44 ` Dave Chinner 2013-02-06 19:28 ` Linda Walsh 2013-02-10 2:04 ` Linda Walsh 1 sibling, 2 replies; 9+ messages in thread From: Dave Chinner @ 2013-02-06 0:44 UTC (permalink / raw) To: Linda Walsh; +Cc: xfs On Sat, Feb 02, 2013 at 10:35:55PM -0800, Linda Walsh wrote: > > > Mark Tinguely wrote: > >On 02/02/13 13:20, Andi Kleen wrote: > >> > >>On a older 3.6-rc9 opensuse kernel I had the following deadlock with > > > >That looks like a hang fixed by the series: > > > > http://oss.sgi.com/archives/xfs/2012-12/msg00071.html > ---- > > That may be so -- I used to see things like that on some older > kernels with random progs but with traces down in xfs.. > > Seemed to be rare and not cause problems....but don't notice those any > more... > > Odd thing about my current probs -- my current system has been > up 12 days...but before that it had been up 43 days... > > > I can't get the buffers to 'free' no matter what > echo 3> /proc/sys/vm/drop_caches does nothing. What buffers are you talking about? The deadlock is in metadata buffer handling, which you can't directly see, and will never be able to entirely free via drop caches. if you are talking about what is reported by the "free" command, then that number can be ignored as it is mostly meaningless for XFS filesystems.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-06 0:44 ` Dave Chinner @ 2013-02-06 19:28 ` Linda Walsh 2013-02-10 2:04 ` Linda Walsh 1 sibling, 0 replies; 9+ messages in thread From: Linda Walsh @ 2013-02-06 19:28 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave Chinner wrote: > On Sat, Feb 02, 2013 at 10:35:55PM -0800, Linda Walsh wrote: >> Odd thing about my current probs -- my current system has been up 12 >> days...but before that it had been up 43 days... >> >> I can't get the buffers to 'free' no matter what echo 3> >> /proc/sys/vm/drop_caches does nothing. > > What buffers are you talking about? The deadlock is in metadata buffer > handling, which you can't directly see, and will never be able to entirely > free via drop caches. > > if you are talking about what is reported by the "free" command, then that > number can be ignored as it is mostly meaningless for XFS filesystems.... --- Supposedly it was cached fs-_data_ (not allocated buffers), that wasn't dirty. Something that should have been free-able, but was eating ~40G of free memory and that couldn't (or wasn't being) "written out to swap", yet wasn't being released to reduce memory pressure (I dunno if xfsdump's "OOM" errors would trigger or give a hint to the kernel to release some non-dirty fs-cache space, but from a system stability point of view, it seemed like it "should" have). Maybe only the failure of lower-order memory allocations triggers mem-release routines, I dunno). I'd guess it wasn't xfs metadata as that would be more likely temporarily pinned in memory until it had been dealt with (examined, or modified for output), but that's a pure guess. Part of me wondered if it might have been some in-mem tmp-file, since Suse has recently put "/run, /var/lock, /var/run and /media" on "tmpfs" (in addition to more standard "/dev, /sys/fs/cgroup/{1 dir for each group}). Other diskless in-mem fs's that securityfs, devpts, sysfs, /proc/sys/fs/binfmt_misc, copies of /proc for chrooted procs, &/proc/fs/nfsd. Onees that might reserve space: debugfs and /dev/shm -- though both indicated under 32M of space used (despite /tmp/shm having about 9G in a sparse file). ---- None of those _appeared_ to be a problem, though with all the small files in memory, its possible fragmentation was an issue. It's something that is a recent change that makes me a little uneasy. Nothing appeared to be glomming on to the memory (max usage was 10% by 'mbuffer' with 5G pinned to buffer xfsdump output, but total "Used" memory (including 'Shared) was under 8G (on a 2x24G Numa config).. It was a bit weird. Have since rebooted with 3.6.7 and changed the SLAB/SLUB from the unqueued (but better for cache-line usage which I though might help speed), to the queued-general purpose). Only been up ~12 hours, and previous problems didn't appear till after > 1 week, BUT, simply moving to the new "git" xfsdump cured the inability (with the old system, unreboooted) cured the meta-data alloc failures). So no worries at this point...more than likely that patch to the xfs utils/dump fixed the prob. Thanks again! linda _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: xfs deadlock on buffer semaphore while reading directory 2013-02-06 0:44 ` Dave Chinner 2013-02-06 19:28 ` Linda Walsh @ 2013-02-10 2:04 ` Linda Walsh 1 sibling, 0 replies; 9+ messages in thread From: Linda Walsh @ 2013-02-10 2:04 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave Chinner wrote: >> I can't get the buffers to 'free' no matter what >> echo 3> /proc/sys/vm/drop_caches does nothing. > > What buffers are you talking about? The deadlock is in metadata > buffer handling, which you can't directly see, and will never be > able to entirely free via drop caches. > > if you are talking about what is reported by the "free" command, > then that number can be ignored as it is mostly meaningless for XFS > filesystems.... ---- Actually was talking 'cache', not buffers... sorry, which I thought was mostly fs-data that was "freeable" (mostly). whereas because of the error in xfs_dump, I was concerned about it taking up all free memory (whereas normally, it doesn't matter). but I was trying to use echo 3>/proc/sys/vm/drop_caches, but that seemed to pretty much be ignored... But after boot and new kernel w/new and different problems, xfs_dump (the new version) works fine... ;) Sorry for any confusion. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-02-10 2:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-02-02 19:20 xfs deadlock on buffer semaphore while reading directory Andi Kleen 2013-02-02 19:54 ` Mark Tinguely 2013-02-02 20:46 ` Andi Kleen 2013-02-05 23:59 ` Mark Tinguely 2013-02-03 6:35 ` Linda Walsh 2013-02-03 16:42 ` Stan Hoeppner 2013-02-06 0:44 ` Dave Chinner 2013-02-06 19:28 ` Linda Walsh 2013-02-10 2:04 ` Linda Walsh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.