All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs deadlock on buffer semaphore while reading directory
@ 2013-02-02 19:20 Andi Kleen
  2013-02-02 19:54 ` Mark Tinguely
  0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2013-02-02 19:20 UTC (permalink / raw)
  To: xfs


On a older 3.6-rc9 opensuse kernel I had the following deadlock with
a "find" running on a USB hard disk. I first thought it was some 
IO request getting stuck from another process (USB can be flakey),
but after looking through sysrq-t there is nothing else active
in XFS. So looks like some kind of XFS race or deadlock?

-Andi

[233265.161582] find            D ffff88042a8e4820     0 17774      1
0x00000004
[233265.161586]  ffff880212ad36c8 0000000000000086 0000000000000000
ffff880212ad3fd8
[233265.161590]  ffff880212ad3fd8 000000000000a000 ffff88042d9596c0
ffff88042a8e4440
[233265.161593]  00000000000128c0 ffff88042a192a00 ffff88042a192a00
0000000000000000
[233265.161596] Call Trace:
[233265.161606]  [<ffffffff81531eec>] ? __schedule+0x3fc/0x8c0
[233265.161610]  [<ffffffff8110a1c0>] ?
get_page_from_freelist+0x170/0x310
[233265.161628]  [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs]
[233265.161632]  [<ffffffff81532469>] schedule+0x29/0x70
[233265.161635]  [<ffffffff815308c5>] schedule_timeout+0x1d5/0x230
[233265.161638]  [<ffffffff8110a9d5>] ?
__alloc_pages_nodemask+0xe5/0x7e0
[233265.161650]  [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs]
[233265.161654]  [<ffffffff815314c1>] __down+0x6a/0x97
[233265.161657]  [<ffffffff81065ee1>] down+0x41/0x50
[233265.161669]  [<ffffffffa0620634>] xfs_buf_lock+0x44/0x110 [xfs]
[233265.161680]  [<ffffffffa06207e3>] _xfs_buf_find+0xe3/0x240 [xfs]
[233265.161692]  [<ffffffffa0620ba1>] xfs_buf_get_map+0x171/0x1b0 [xfs]
[233265.161703]  [<ffffffffa06218fd>] xfs_buf_read_map+0x2d/0x110 [xfs]
[233265.161739]  [<ffffffffa06550c9>] ? xfs_dabuf_map.isra.2+0x239/0x250
[xfs]
[233265.161759]  [<ffffffffa067c3f5>] xfs_trans_read_buf_map+0x265/0x480
[xfs]
[233265.161776]  [<ffffffffa0656486>] xfs_da_read_buf+0xc6/0x1f0 [xfs]
[233265.161787]  [<ffffffffa06202de>] ? xfs_buf_rele+0x4e/0x130 [xfs]
[233265.161799]  [<ffffffffa062096d>] ? xfs_buf_unlock+0x2d/0xa0 [xfs]
[233265.161814]  [<ffffffffa0657893>] xfs_da_node_lookup_int+0xa3/0x2c0
[xfs]
[233265.161830]  [<ffffffffa0660161>] xfs_dir2_node_lookup+0x51/0x170
[xfs]
[233265.161845]  [<ffffffffa0658b86>] ? xfs_dir2_isleaf+0x26/0x60 [xfs]
[233265.161860]  [<ffffffffa065911d>] xfs_dir_lookup+0x15d/0x170 [xfs]
[233265.161874]  [<ffffffffa063488f>] xfs_lookup+0xcf/0x130 [xfs]
[233265.161887]  [<ffffffffa062b9d1>] xfs_vn_lookup+0x51/0x90 [xfs]
[233265.161891]  [<ffffffff81159a8b>] ? lookup_dcache+0xab/0xd0
[233265.161894]  [<ffffffff8115952d>] lookup_real+0x1d/0x60
[233265.161898]  [<ffffffff81159ae8>] __lookup_hash+0x38/0x50
[233265.161901]  [<ffffffff8115a7ee>] lookup_slow+0x4e/0xc0
[233265.161904]  [<ffffffff8115c7ef>] path_lookupat+0x73f/0x790
[233265.161908]  [<ffffffff8115c871>] do_path_lookup+0x31/0xc0
[233265.161911]  [<ffffffff8115eb69>] user_path_at_empty+0x59/0xa0
[233265.161915]  [<ffffffff8116c879>] ? mntput_no_expire+0x49/0x160
[233265.161918]  [<ffffffff811535d7>] ? cp_new_stat+0x107/0x120
[233265.161921]  [<ffffffff8115ebc1>] user_path_at+0x11/0x20
[233265.161924]  [<ffffffff811537ba>] vfs_fstatat+0x3a/0x70
[233265.161927]  [<ffffffff811539ea>] sys_newfstatat+0x1a/0x40
[233265.161930]  [<ffffffff8153b712>] system_call_fastpath+0x16/0x1b


-- 
ak@linux.intel.com -- Speaking for myself only.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-02 19:20 xfs deadlock on buffer semaphore while reading directory Andi Kleen
@ 2013-02-02 19:54 ` Mark Tinguely
  2013-02-02 20:46   ` Andi Kleen
  2013-02-03  6:35   ` Linda Walsh
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Tinguely @ 2013-02-02 19:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: xfs

On 02/02/13 13:20, Andi Kleen wrote:
>
> On a older 3.6-rc9 opensuse kernel I had the following deadlock with
> a "find" running on a USB hard disk. I first thought it was some
> IO request getting stuck from another process (USB can be flakey),
> but after looking through sysrq-t there is nothing else active
> in XFS. So looks like some kind of XFS race or deadlock?
>
> -Andi
>
> [233265.161582] find            D ffff88042a8e4820     0 17774      1
> 0x00000004
> [233265.161586]  ffff880212ad36c8 0000000000000086 0000000000000000
> ffff880212ad3fd8
> [233265.161590]  ffff880212ad3fd8 000000000000a000 ffff88042d9596c0
> ffff88042a8e4440
> [233265.161593]  00000000000128c0 ffff88042a192a00 ffff88042a192a00
> 0000000000000000
> [233265.161596] Call Trace:
> [233265.161606]  [<ffffffff81531eec>] ? __schedule+0x3fc/0x8c0
> [233265.161610]  [<ffffffff8110a1c0>] ?
> get_page_from_freelist+0x170/0x310
> [233265.161628]  [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs]
> [233265.161632]  [<ffffffff81532469>] schedule+0x29/0x70
> [233265.161635]  [<ffffffff815308c5>] schedule_timeout+0x1d5/0x230
> [233265.161638]  [<ffffffff8110a9d5>] ?
> __alloc_pages_nodemask+0xe5/0x7e0
> [233265.161650]  [<ffffffffa06207e3>] ? _xfs_buf_find+0xe3/0x240 [xfs]
> [233265.161654]  [<ffffffff815314c1>] __down+0x6a/0x97
> [233265.161657]  [<ffffffff81065ee1>] down+0x41/0x50
> [233265.161669]  [<ffffffffa0620634>] xfs_buf_lock+0x44/0x110 [xfs]
> [233265.161680]  [<ffffffffa06207e3>] _xfs_buf_find+0xe3/0x240 [xfs]
> [233265.161692]  [<ffffffffa0620ba1>] xfs_buf_get_map+0x171/0x1b0 [xfs]
> [233265.161703]  [<ffffffffa06218fd>] xfs_buf_read_map+0x2d/0x110 [xfs]
> [233265.161739]  [<ffffffffa06550c9>] ? xfs_dabuf_map.isra.2+0x239/0x250
> [xfs]
> [233265.161759]  [<ffffffffa067c3f5>] xfs_trans_read_buf_map+0x265/0x480
> [xfs]
> [233265.161776]  [<ffffffffa0656486>] xfs_da_read_buf+0xc6/0x1f0 [xfs]
> [233265.161787]  [<ffffffffa06202de>] ? xfs_buf_rele+0x4e/0x130 [xfs]
> [233265.161799]  [<ffffffffa062096d>] ? xfs_buf_unlock+0x2d/0xa0 [xfs]
> [233265.161814]  [<ffffffffa0657893>] xfs_da_node_lookup_int+0xa3/0x2c0
> [xfs]
> [233265.161830]  [<ffffffffa0660161>] xfs_dir2_node_lookup+0x51/0x170
> [xfs]
> [233265.161845]  [<ffffffffa0658b86>] ? xfs_dir2_isleaf+0x26/0x60 [xfs]
> [233265.161860]  [<ffffffffa065911d>] xfs_dir_lookup+0x15d/0x170 [xfs]
> [233265.161874]  [<ffffffffa063488f>] xfs_lookup+0xcf/0x130 [xfs]
> [233265.161887]  [<ffffffffa062b9d1>] xfs_vn_lookup+0x51/0x90 [xfs]
> [233265.161891]  [<ffffffff81159a8b>] ? lookup_dcache+0xab/0xd0
> [233265.161894]  [<ffffffff8115952d>] lookup_real+0x1d/0x60
> [233265.161898]  [<ffffffff81159ae8>] __lookup_hash+0x38/0x50
> [233265.161901]  [<ffffffff8115a7ee>] lookup_slow+0x4e/0xc0
> [233265.161904]  [<ffffffff8115c7ef>] path_lookupat+0x73f/0x790
> [233265.161908]  [<ffffffff8115c871>] do_path_lookup+0x31/0xc0
> [233265.161911]  [<ffffffff8115eb69>] user_path_at_empty+0x59/0xa0
> [233265.161915]  [<ffffffff8116c879>] ? mntput_no_expire+0x49/0x160
> [233265.161918]  [<ffffffff811535d7>] ? cp_new_stat+0x107/0x120
> [233265.161921]  [<ffffffff8115ebc1>] user_path_at+0x11/0x20
> [233265.161924]  [<ffffffff811537ba>] vfs_fstatat+0x3a/0x70
> [233265.161927]  [<ffffffff811539ea>] sys_newfstatat+0x1a/0x40
> [233265.161930]  [<ffffffff8153b712>] system_call_fastpath+0x16/0x1b
>
>

That looks like a hang fixed by the series:

	http://oss.sgi.com/archives/xfs/2012-12/msg00071.html

--Mark Tinguely.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-02 19:54 ` Mark Tinguely
@ 2013-02-02 20:46   ` Andi Kleen
  2013-02-05 23:59     ` Mark Tinguely
  2013-02-03  6:35   ` Linda Walsh
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2013-02-02 20:46 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Andi Kleen, xfs

> That looks like a hang fixed by the series:
> 
> 	http://oss.sgi.com/archives/xfs/2012-12/msg00071.html

Great that it is already fixed. Thanks.

Is the fix considered for stable?

-Andi

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-02 19:54 ` Mark Tinguely
  2013-02-02 20:46   ` Andi Kleen
@ 2013-02-03  6:35   ` Linda Walsh
  2013-02-03 16:42     ` Stan Hoeppner
  2013-02-06  0:44     ` Dave Chinner
  1 sibling, 2 replies; 9+ messages in thread
From: Linda Walsh @ 2013-02-03  6:35 UTC (permalink / raw)
  To: xfs



Mark Tinguely wrote:
> On 02/02/13 13:20, Andi Kleen wrote:
>>
>> On a older 3.6-rc9 opensuse kernel I had the following deadlock with
>
> That looks like a hang fixed by the series:
> 
>     http://oss.sgi.com/archives/xfs/2012-12/msg00071.html
----

That may be so -- I used to see things like that on some older
kernels with random progs but with traces down in xfs..

Seemed to be rare and not cause problems....but don't notice those any
more...

Odd thing about my current probs -- my current system has been
up 12 days...but before that it had been up 43 days...


I can't get the buffers to 'free' no matter what
echo 3> /proc/sys/vm/drop_caches does nothing.

so I may be rebooting soon...

The large number of from that log (was from 2/1 4:30am)
was that it was level 0 backup day....

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-03  6:35   ` Linda Walsh
@ 2013-02-03 16:42     ` Stan Hoeppner
  2013-02-06  0:44     ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Stan Hoeppner @ 2013-02-03 16:42 UTC (permalink / raw)
  To: Linda Walsh; +Cc: xfs

On 2/3/2013 12:35 AM, Linda Walsh wrote:

> Odd thing about my current probs -- my current system has been
> up 12 days...but before that it had been up 43 days...

Linux greer 3.2.6 #1 SMP Mon Feb 20 17:05:10 CST 2012 i686 GNU/Linux
 10:13:13 up 339 days, 22:42,  1 user,  load average: 0.06, 0.06, 0.05

This is a hand rolled 'minimalist' kernel with the old SLAB allocator,
no modules, Debian 6 atop.  Just over 11 months of flawless operation,
though load is probably much lighter than on your system.

> I can't get the buffers to 'free' no matter what
> echo 3> /proc/sys/vm/drop_caches does nothing.

That's interesting.

> so I may be rebooting soon...

Did you roll this 3.7.1 yourself?  I'm wondering if you did something
wonky in your config that's causing or contributing to your problems.

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-02 20:46   ` Andi Kleen
@ 2013-02-05 23:59     ` Mark Tinguely
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Tinguely @ 2013-02-05 23:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: xfs

On 02/02/13 14:46, Andi Kleen wrote:
>> That looks like a hang fixed by the series:
>>
>> 	http://oss.sgi.com/archives/xfs/2012-12/msg00071.html
>
> Great that it is already fixed. Thanks.
>
> Is the fix considered for stable?
>
> -Andi
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

Thanks for the reminder.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-03  6:35   ` Linda Walsh
  2013-02-03 16:42     ` Stan Hoeppner
@ 2013-02-06  0:44     ` Dave Chinner
  2013-02-06 19:28       ` Linda Walsh
  2013-02-10  2:04       ` Linda Walsh
  1 sibling, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2013-02-06  0:44 UTC (permalink / raw)
  To: Linda Walsh; +Cc: xfs

On Sat, Feb 02, 2013 at 10:35:55PM -0800, Linda Walsh wrote:
> 
> 
> Mark Tinguely wrote:
> >On 02/02/13 13:20, Andi Kleen wrote:
> >>
> >>On a older 3.6-rc9 opensuse kernel I had the following deadlock with
> >
> >That looks like a hang fixed by the series:
> >
> >    http://oss.sgi.com/archives/xfs/2012-12/msg00071.html
> ----
> 
> That may be so -- I used to see things like that on some older
> kernels with random progs but with traces down in xfs..
> 
> Seemed to be rare and not cause problems....but don't notice those any
> more...
> 
> Odd thing about my current probs -- my current system has been
> up 12 days...but before that it had been up 43 days...
> 
> 
> I can't get the buffers to 'free' no matter what
> echo 3> /proc/sys/vm/drop_caches does nothing.

What buffers are you talking about? The deadlock is in metadata
buffer handling, which you can't directly see, and will never be
able to entirely free via drop caches.

if you are talking about what is reported by the "free" command,
then that number can be ignored as it is mostly meaningless for XFS
filesystems....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-06  0:44     ` Dave Chinner
@ 2013-02-06 19:28       ` Linda Walsh
  2013-02-10  2:04       ` Linda Walsh
  1 sibling, 0 replies; 9+ messages in thread
From: Linda Walsh @ 2013-02-06 19:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs



Dave Chinner wrote:
> On Sat, Feb 02, 2013 at 10:35:55PM -0800, Linda Walsh wrote:
>> Odd thing about my current probs -- my current system has been up 12
>> days...but before that it had been up 43 days...
>>
>> I can't get the buffers to 'free' no matter what echo 3>
>> /proc/sys/vm/drop_caches does nothing.
> 
> What buffers are you talking about? The deadlock is in metadata buffer
> handling, which you can't directly see, and will never be able to entirely
> free via drop caches. 
> 
> if you are talking about what is reported by the "free" command, then that
> number can be ignored as it is mostly meaningless for XFS filesystems....

---

Supposedly it was cached fs-_data_ (not allocated buffers), that wasn't
dirty.  Something that should have been free-able, but was eating ~40G of
free memory and that couldn't (or wasn't being) "written out to swap", yet
wasn't being released to reduce memory pressure (I dunno if xfsdump's "OOM"
errors would trigger or give a hint to the kernel to release some non-dirty
fs-cache space, but from a system stability point of view, it seemed like it
"should" have).  Maybe only the failure of lower-order memory allocations
triggers mem-release routines, I dunno).

I'd guess it wasn't xfs metadata as that would be more likely temporarily
pinned in memory until it had been dealt with (examined, or modified for
output), but that's a pure guess.

Part of me wondered if it might have been some in-mem tmp-file, since Suse
has recently put "/run, /var/lock, /var/run and /media" on "tmpfs" (in
addition to more standard "/dev, /sys/fs/cgroup/{1 dir for each group}).

Other diskless in-mem fs's that securityfs, devpts, sysfs,
/proc/sys/fs/binfmt_misc, copies of /proc for chrooted procs,
&/proc/fs/nfsd.

Onees that might reserve space: debugfs and /dev/shm -- though both
indicated under 32M of space used (despite /tmp/shm having about 9G in
a sparse file).

---- None of those _appeared_ to be a problem, though with all the small
files in memory, its possible fragmentation was an issue.  It's something
that is a recent change that makes me a little uneasy.


	Nothing appeared to be glomming on to the memory  (max usage  was 10% by
'mbuffer' with 5G pinned to buffer xfsdump output, but total "Used" memory
(including 'Shared) was under 8G (on a 2x24G Numa config)..  It was a bit
weird.

	Have since rebooted with 3.6.7 and changed the SLAB/SLUB from the unqueued
(but better for cache-line usage which I though might help speed), to the
queued-general purpose).  Only been up ~12 hours, and previous problems
didn't appear till after > 1 week, BUT, simply moving to the new "git"
xfsdump cured the inability (with the old system, unreboooted) cured the
meta-data alloc failures).

	So no worries at this point...more than likely that patch to the xfs
utils/dump fixed the prob.

	Thanks again!

linda


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs deadlock on buffer semaphore while reading directory
  2013-02-06  0:44     ` Dave Chinner
  2013-02-06 19:28       ` Linda Walsh
@ 2013-02-10  2:04       ` Linda Walsh
  1 sibling, 0 replies; 9+ messages in thread
From: Linda Walsh @ 2013-02-10  2:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave Chinner wrote:
>> I can't get the buffers to 'free' no matter what
>> echo 3> /proc/sys/vm/drop_caches does nothing.
> 
> What buffers are you talking about? The deadlock is in metadata
> buffer handling, which you can't directly see, and will never be
> able to entirely free via drop caches.
> 
> if you are talking about what is reported by the "free" command,
> then that number can be ignored as it is mostly meaningless for XFS
> filesystems....
----
	Actually was talking 'cache', not buffers... sorry,
which I thought was mostly fs-data that was "freeable" (mostly).
whereas because of the error in xfs_dump, I was concerned about
it taking up all free memory (whereas normally, it doesn't
matter).

but I was trying to use echo 3>/proc/sys/vm/drop_caches, but that
seemed to pretty much be ignored...

But after boot and new kernel w/new and different problems,
xfs_dump (the new version) works fine... ;)
Sorry for any confusion.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-02-10  2:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-02 19:20 xfs deadlock on buffer semaphore while reading directory Andi Kleen
2013-02-02 19:54 ` Mark Tinguely
2013-02-02 20:46   ` Andi Kleen
2013-02-05 23:59     ` Mark Tinguely
2013-02-03  6:35   ` Linda Walsh
2013-02-03 16:42     ` Stan Hoeppner
2013-02-06  0:44     ` Dave Chinner
2013-02-06 19:28       ` Linda Walsh
2013-02-10  2:04       ` Linda Walsh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.