* open sleeps
@ 2008-06-19 5:27 Brian May
2008-06-19 5:30 ` Brian May
2008-06-19 6:21 ` Dave Chinner
0 siblings, 2 replies; 17+ messages in thread
From: Brian May @ 2008-06-19 5:27 UTC (permalink / raw)
To: xfs
Hello,
I am having (weird) issues with XFS, in that open(...) on certain files
takes 45 seconds to return. After the file has been opened, the next
file in the same directory takes 45 seconds. If the file was recently
opened it returns immediately.
I thought this was a low level I/O issue, so copied the files in
question to a completely independent RAID array (separate LVM, RAID,
controllers, disks), but the problem remains.
More details at thread starting from
<http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
Any ideas?
Please CC me responses.
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 5:27 open sleeps Brian May
@ 2008-06-19 5:30 ` Brian May
2008-06-19 6:21 ` Dave Chinner
1 sibling, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-19 5:30 UTC (permalink / raw)
To: Brian May; +Cc: xfs
Sorry, repost with the weird Followup-To header (was hoping Thunderbird
would use Mail-Followup-To but obviously not).
Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain
> files takes 45 seconds to return. After the file has been opened, the
> next file in the same directory takes 45 seconds. If the file was
> recently opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in
> question to a completely independent RAID array (separate LVM, RAID,
> controllers, disks), but the problem remains.
>
> More details at thread starting from
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?
>
> Please CC me responses.
>
> Thanks.
>
> Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 5:27 open sleeps Brian May
2008-06-19 5:30 ` Brian May
@ 2008-06-19 6:21 ` Dave Chinner
2008-06-19 6:40 ` Brian May
1 sibling, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-06-19 6:21 UTC (permalink / raw)
To: Brian May; +Cc: xfs
On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain files
> takes 45 seconds to return. After the file has been opened, the next
> file in the same directory takes 45 seconds. If the file was recently
> opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in
> question to a completely independent RAID array (separate LVM, RAID,
> controllers, disks), but the problem remains.
>
> More details at thread starting from
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?
# echo t > /proc/sysrq-trigger
when it is hung to get a stack trace of the blocked open call.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 6:21 ` Dave Chinner
@ 2008-06-19 6:40 ` Brian May
2008-06-19 8:43 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-19 6:40 UTC (permalink / raw)
To: Brian May, xfs
Dave Chinner wrote:
> On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
>
>> Hello,
>>
>> I am having (weird) issues with XFS, in that open(...) on certain files
>> takes 45 seconds to return. After the file has been opened, the next
>> file in the same directory takes 45 seconds. If the file was recently
>> opened it returns immediately.
>>
>> I thought this was a low level I/O issue, so copied the files in
>> question to a completely independent RAID array (separate LVM, RAID,
>> controllers, disks), but the problem remains.
>>
>> More details at thread starting from
>> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>>
>> Any ideas?
>>
>
> # echo t > /proc/sysrq-trigger
>
> when it is hung to get a stack trace of the blocked open call.
>
> Cheers,
>
> Dave.
>
Does the following help? I still have the logs of the other processes,
if required (just in case it is some weird interaction between multiple
processes?)
It seems to be pretty consistent with lock_timer_base, every time I look
(assuming I haven't read the stack trace upside down...).
Jun 19 16:33:30 hq kernel: grep S 00000000 0 12793 12567 (NOTLB)
Jun 19 16:33:30 hq kernel: f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550
Jun 19 16:33:30 hq kernel: 34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286
Jun 19 16:33:30 hq kernel: c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b
Jun 19 16:33:30 hq kernel: Call Trace:
Jun 19 16:33:30 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
Jun 19 16:33:30 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
Jun 19 16:33:30 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
Jun 19 16:33:30 hq kernel: [<c016c801>] __break_lease+0x2a8/0x2b9
Jun 19 16:33:30 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 19 16:33:30 hq kernel: [<c0166450>] may_open+0x125/0x203
Jun 19 16:33:30 hq kernel: [<c0168451>] open_namei+0x23d/0x5c8
Jun 19 16:33:30 hq kernel: [<c0158cb8>] do_filp_open+0x1c/0x31
Jun 19 16:33:30 hq kernel: [<c0161f38>] sys_stat64+0x1e/0x23
Jun 19 16:33:30 hq kernel: [<c0158d0b>] do_sys_open+0x3e/0xb3
Jun 19 16:33:30 hq kernel: [<c0158dad>] sys_open+0x16/0x18
Jun 19 16:33:30 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Jun 19 16:33:50 hq kernel: grep S 00000000 0 12793 12567 (NOTLB)
Jun 19 16:33:50 hq kernel: f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550
Jun 19 16:33:50 hq kernel: 34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286
Jun 19 16:33:50 hq kernel: c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b
Jun 19 16:33:50 hq kernel: Call Trace:
Jun 19 16:33:50 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
Jun 19 16:33:50 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
Jun 19 16:33:50 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
Jun 19 16:33:50 hq kernel: [<c016c801>] __break_lease+0x2a8/0x2b9
Jun 19 16:33:50 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 19 16:33:50 hq kernel: [<c0166450>] may_open+0x125/0x203
Jun 19 16:33:50 hq kernel: [<c0168451>] open_namei+0x23d/0x5c8
Jun 19 16:33:50 hq kernel: [<c0158cb8>] do_filp_open+0x1c/0x31
Jun 19 16:33:50 hq kernel: [<c0161f38>] sys_stat64+0x1e/0x23
Jun 19 16:33:50 hq kernel: [<c0158d0b>] do_sys_open+0x3e/0xb3
Jun 19 16:33:50 hq kernel: [<c0158dad>] sys_open+0x16/0x18
Jun 19 16:33:50 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 6:40 ` Brian May
@ 2008-06-19 8:43 ` Christoph Hellwig
2008-06-19 9:48 ` Sebastian Brings
2008-06-20 1:47 ` Brian May
0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2008-06-19 8:43 UTC (permalink / raw)
To: Brian May; +Cc: xfs
On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> Does the following help? I still have the logs of the other processes, if
> required (just in case it is some weird interaction between multiple
> processes?)
>
> It seems to be pretty consistent with lock_timer_base, every time I look
> (assuming I haven't read the stack trace upside down...).
>
> Jun 19 16:33:30 hq kernel: grep S 00000000 0 12793 12567 (NOTLB)
>
> Jun 19 16:33:30 hq kernel: f0c23e7c 00200082 000a1089 00000000
> 00000010 00000008 cd0db550 dfa97550
> Jun 19 16:33:30 hq kernel: 34f84262 00273db2 0008a1dc 00000001
> cd0db660 c20140a0 dfe1cbe8 00200286
> Jun 19 16:33:30 hq kernel: c0125380 a4dbf26b dfa6a000 00200286
> 000000ff 00000000 00000000 a4dbf26b
> Jun 19 16:33:30 hq kernel: Call Trace:
>
> Jun 19 16:33:30 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
>
> Jun 19 16:33:30 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
>
> Jun 19 16:33:30 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
>
> Jun 19 16:33:30 hq kernel: [<c016c801>] __break_lease+0x2a8/0x2b9
That's the lease breaking code in the VFS, long before we call
into XFS. Looks like someone (samba?) has a least on this file and
we're having trouble having it broken. Try sending a report about
this to linux-fsdevel@vger.kernel.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: open sleeps
2008-06-19 8:43 ` Christoph Hellwig
@ 2008-06-19 9:48 ` Sebastian Brings
2008-06-20 1:47 ` Brian May
1 sibling, 0 replies; 17+ messages in thread
From: Sebastian Brings @ 2008-06-19 9:48 UTC (permalink / raw)
To: xfs
> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com]
> On Behalf Of Christoph Hellwig
> Sent: Donnerstag, 19. Juni 2008 10:43
> To: Brian May
> Cc: xfs@oss.sgi.com
> Subject: Re: open sleeps
>
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> > Does the following help? I still have the logs of the other
> processes, if
> > required (just in case it is some weird interaction between multiple
> > processes?)
> >
> > It seems to be pretty consistent with lock_timer_base,
> every time I look
> > (assuming I haven't read the stack trace upside down...).
> >
> > Jun 19 16:33:30 hq kernel: grep S 00000000 0
> 12793 12567 (NOTLB)
> >
> > Jun 19 16:33:30 hq kernel: f0c23e7c 00200082
> 000a1089 00000000
> > 00000010 00000008 cd0db550 dfa97550
> > Jun 19 16:33:30 hq kernel: 34f84262 00273db2
> 0008a1dc 00000001
> > cd0db660 c20140a0 dfe1cbe8 00200286
> > Jun 19 16:33:30 hq kernel: c0125380 a4dbf26b
> dfa6a000 00200286
> > 000000ff 00000000 00000000 a4dbf26b
> > Jun 19 16:33:30 hq kernel: Call Trace:
> >
> > Jun 19 16:33:30 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
> >
> > Jun 19 16:33:30 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
> >
> > Jun 19 16:33:30 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
> >
> > Jun 19 16:33:30 hq kernel: [<c016c801>] __break_lease+0x2a8/0x2b9
>
> That's the lease breaking code in the VFS, long before we call
> into XFS. Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken. Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>
>
For curiosity: Is the lease breaking you mention breaking a kernel oplock samba may hold?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 8:43 ` Christoph Hellwig
2008-06-19 9:48 ` Sebastian Brings
@ 2008-06-20 1:47 ` Brian May
2008-06-20 4:11 ` Brian May
1 sibling, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20 1:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
Christoph Hellwig wrote:
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
>
>> Does the following help? I still have the logs of the other processes, if
>> required (just in case it is some weird interaction between multiple
>> processes?)
>>
>> It seems to be pretty consistent with lock_timer_base, every time I look
>> (assuming I haven't read the stack trace upside down...).
>>
>> Jun 19 16:33:30 hq kernel: grep S 00000000 0 12793 12567 (NOTLB)
>>
>> Jun 19 16:33:30 hq kernel: f0c23e7c 00200082 000a1089 00000000
>> 00000010 00000008 cd0db550 dfa97550
>> Jun 19 16:33:30 hq kernel: 34f84262 00273db2 0008a1dc 00000001
>> cd0db660 c20140a0 dfe1cbe8 00200286
>> Jun 19 16:33:30 hq kernel: c0125380 a4dbf26b dfa6a000 00200286
>> 000000ff 00000000 00000000 a4dbf26b
>> Jun 19 16:33:30 hq kernel: Call Trace:
>>
>> Jun 19 16:33:30 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
>>
>> Jun 19 16:33:30 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
>>
>> Jun 19 16:33:30 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
>>
>> Jun 19 16:33:30 hq kernel: [<c016c801>] __break_lease+0x2a8/0x2b9
>>
>
> That's the lease breaking code in the VFS, long before we call
> into XFS. Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken. Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>
I feel I am going around in circles.
Anyway, I started the discussion from
<http://www.archivum.info/linux-fsdevel@vger.kernel.org/2008-06/msg00337.html>.
In the last message (which isn't archived yet), I looked at the Samba
process that is holding the lease. The following is the stack trace of
this process. I don't understand why the XFS code is calling e1000 code,
the filesystem isn't attached via the network. Perhaps this would mean
the problem is with the network code???
Jun 20 10:54:37 hq kernel: smbd S 00000000 0 13516 11112 13459 (NOTLB)
Jun 20 10:54:37 hq kernel: ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0
Jun 20 10:54:37 hq kernel: ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286
Jun 20 10:54:37 hq kernel: c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b
Jun 20 10:54:37 hq kernel: Call Trace:
Jun 20 10:54:37 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
Jun 20 10:54:37 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
Jun 20 10:54:37 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
Jun 20 10:54:37 hq kernel: [<c016a115>] do_select+0x37a/0x3d4
Jun 20 10:54:37 hq kernel: [<c016a677>] __pollwait+0x0/0xb2
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
Jun 20 10:54:37 hq kernel: [<c0121c24>] tasklet_action+0x55/0xaf
Jun 20 10:54:37 hq kernel: [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
Jun 20 10:54:37 hq kernel: [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<c02547e4>] tcp_transmit_skb+0x604/0x632
Jun 20 10:54:37 hq kernel: [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c0165370>] do_lookup+0x4f/0x135
Jun 20 10:54:37 hq kernel: [<c016dbc4>] dput+0x1a/0x11b
Jun 20 10:54:37 hq kernel: [<c0167312>] __link_path_walk+0xbe4/0xd1d
Jun 20 10:54:37 hq kernel: [<c016a3fb>] core_sys_select+0x28c/0x2a9
Jun 20 10:54:37 hq kernel: [<c01674fe>] link_path_walk+0xb3/0xbd
Jun 20 10:54:37 hq kernel: [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
Jun 20 10:54:37 hq kernel: [<c016785d>] do_path_lookup+0x20a/0x225
Jun 20 10:54:37 hq kernel: [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
Jun 20 10:54:37 hq kernel: [<c0161b28>] cp_new_stat64+0xfd/0x10f
Jun 20 10:54:37 hq kernel: [<c016a9c1>] sys_select+0x9f/0x182
Jun 20 10:54:37 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
I guess I also need to make sure I get this same stack trace each time.
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 1:47 ` Brian May
@ 2008-06-20 4:11 ` Brian May
2008-06-26 6:29 ` Brian May
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20 4:11 UTC (permalink / raw)
To: Brian May; +Cc: Christoph Hellwig, xfs
Brian May wrote:
> I guess I also need to make sure I get this same stack trace each time.
Yes. Looks consistent to me.
Brian May.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 4:11 ` Brian May
@ 2008-06-26 6:29 ` Brian May
0 siblings, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-26 6:29 UTC (permalink / raw)
To: Brian May; +Cc: Christoph Hellwig, xfs
Hello,
Thanks for your help.
Just a followup: Every indication seems to be that it is the firewall in
the virus scanner we use, F-Secure, that prevents Windows from releasing
oplocks properly. After turning off the firewall, the oplocks get
released and there is no problem. Possibly related to the timing of wpkg
being invoked before system logons too.
Might be time to consider another virus scanner/firewall solution.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 6:45 ` Dave Chinner
@ 2008-06-26 6:30 ` Brian May
0 siblings, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-26 6:30 UTC (permalink / raw)
To: Brian May, Stephen Rothwell, linux-fsdevel
Dave Chinner wrote:
> This is just the stupid (broken) ia32 stack tracing - it outputs every
> single object it sees in the stack that may be function call. Hence
> you see some stuff that is from past stack traversals. I'd say this
> is sleeping in a select() call, not doing anything related to
> xfs or networking....
>
Hello,
Thanks for your help.
Just a followup: Every indication seems to be that it is the firewall in
the virus scanner we use, F-Secure, that prevents Windows from releasing
oplocks properly. After turning off the firewall, the oplocks get
released and there is no problem. Possibly related to the timing of wpkg
being invoked before system logons too.
Might be time to consider another virus scanner/firewall solution.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 1:11 ` Brian May
@ 2008-06-20 6:45 ` Dave Chinner
2008-06-26 6:30 ` Brian May
0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-06-20 6:45 UTC (permalink / raw)
To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel
On Fri, Jun 20, 2008 at 11:11:54AM +1000, Brian May wrote:
> Brian May wrote:
>> Stephen Rothwell wrote:
>>> Its what samba calls an oplock. Its like a kernel file lock but it
>>> blocks opens and notifies the holder (who is supposed to clean up and
>>> release the lease). If the lease is not released in 45 seconds (by
>>> default) then the kernel assumes that the holder is broken and breaks
>>> the
>>> lease and allows the open to proceed.
>>>
>> Ok, so I guess it is most likely Samba's fault then. Most likely this
>> is the only application that uses leases/oplocks on this computer
>> anyway. I will keep investigating...
> Maybe I spoke too soon when I ruled XFS out :-(.
....
> hq:~# echo t > /proc/sysrq-trigger
> Gives a stack trace of:
>
> Jun 20 10:54:37 hq kernel: Call Trace:
>
> Jun 20 10:54:37 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
> Jun 20 10:54:37 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
> Jun 20 10:54:37 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
> Jun 20 10:54:37 hq kernel: [<c016a115>] do_select+0x37a/0x3d4
> Jun 20 10:54:37 hq kernel: [<c016a677>] __pollwait+0x0/0xb2
> Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel: [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
> Jun 20 10:54:37 hq kernel: [<c0121c24>] tasklet_action+0x55/0xaf
> Jun 20 10:54:37 hq kernel: [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
> Jun 20 10:54:37 hq kernel: [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel: [<c02547e4>] tcp_transmit_skb+0x604/0x632
> Jun 20 10:54:37 hq kernel: [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
> Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel: [<c0165370>] do_lookup+0x4f/0x135
> Jun 20 10:54:37 hq kernel: [<c016dbc4>] dput+0x1a/0x11b
> Jun 20 10:54:37 hq kernel: [<c0167312>] __link_path_walk+0xbe4/0xd1d
> Jun 20 10:54:37 hq kernel: [<c016a3fb>] core_sys_select+0x28c/0x2a9
> Jun 20 10:54:37 hq kernel: [<c01674fe>] link_path_walk+0xb3/0xbd
> Jun 20 10:54:37 hq kernel: [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
> Jun 20 10:54:37 hq kernel: [<c016785d>] do_path_lookup+0x20a/0x225
> Jun 20 10:54:37 hq kernel: [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
> Jun 20 10:54:37 hq kernel: [<c0161b28>] cp_new_stat64+0xfd/0x10f
> Jun 20 10:54:37 hq kernel: [<c016a9c1>] sys_select+0x9f/0x182
> Jun 20 10:54:37 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
This is just the stupid (broken) ia32 stack tracing - it outputs every
single object it sees in the stack that may be function call. Hence
you see some stuff that is from past stack traversals. I'd say this
is sleeping in a select() call, not doing anything related to
xfs or networking....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 0:50 ` Brian May
@ 2008-06-20 1:11 ` Brian May
2008-06-20 6:45 ` Dave Chinner
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20 1:11 UTC (permalink / raw)
To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel
Brian May wrote:
> Stephen Rothwell wrote:
>> Its what samba calls an oplock. Its like a kernel file lock but it
>> blocks opens and notifies the holder (who is supposed to clean up and
>> release the lease). If the lease is not released in 45 seconds (by
>> default) then the kernel assumes that the holder is broken and breaks
>> the
>> lease and allows the open to proceed.
>>
> Ok, so I guess it is most likely Samba's fault then. Most likely this
> is the only application that uses leases/oplocks on this computer
> anyway. I will keep investigating...
Maybe I spoke too soon when I ruled XFS out :-(.
hq:~# cat /proc/locks | grep :0a:
5: LEASE ACTIVE WRITE 13516 fd:0a:131 0 EOF
hq:~# ps 13516
PID TTY STAT TIME COMMAND
13516 ? S 0:00 /usr/sbin/smbd -D
hq:~# find /var/lib/wpkg -inum 131
/var/lib/wpkg/cp_status.cmd
hq:~# cat /var/lib/wpkg/cp_status.cmd
[Ctrl-C pushed]
hq:~# cat /proc/locks | grep :0a:
5: LEASE BREAKING READ 13516 fd:0a:131 0 EOF
hq:~# echo t > /proc/sysrq-trigger
Gives a stack trace of:
Jun 20 10:54:37 hq kernel: smbd S 00000000 0 13516 11112 13459 (NOTLB)
Jun 20 10:54:37 hq kernel: ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0
Jun 20 10:54:37 hq kernel: ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286
Jun 20 10:54:37 hq kernel: c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b
Jun 20 10:54:37 hq kernel: Call Trace:
Jun 20 10:54:37 hq kernel: [<c0125380>] lock_timer_base+0x15/0x2f
Jun 20 10:54:37 hq kernel: [<c027f960>] schedule_timeout+0x71/0x8c
Jun 20 10:54:37 hq kernel: [<c0124a81>] process_timeout+0x0/0x5
Jun 20 10:54:37 hq kernel: [<c016a115>] do_select+0x37a/0x3d4
Jun 20 10:54:37 hq kernel: [<c016a677>] __pollwait+0x0/0xb2
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<c0117778>] default_wake_function+0x0/0xc
Jun 20 10:54:37 hq kernel: [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
Jun 20 10:54:37 hq kernel: [<c0121c24>] tasklet_action+0x55/0xaf
Jun 20 10:54:37 hq kernel: [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
Jun 20 10:54:37 hq kernel: [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
Jun 20 10:54:37 hq kernel: [<c02547e4>] tcp_transmit_skb+0x604/0x632
Jun 20 10:54:37 hq kernel: [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c016d84e>] __d_lookup+0x98/0xdb
Jun 20 10:54:37 hq kernel: [<c0165370>] do_lookup+0x4f/0x135
Jun 20 10:54:37 hq kernel: [<c016dbc4>] dput+0x1a/0x11b
Jun 20 10:54:37 hq kernel: [<c0167312>] __link_path_walk+0xbe4/0xd1d
Jun 20 10:54:37 hq kernel: [<c016a3fb>] core_sys_select+0x28c/0x2a9
Jun 20 10:54:37 hq kernel: [<c01674fe>] link_path_walk+0xb3/0xbd
Jun 20 10:54:37 hq kernel: [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
Jun 20 10:54:37 hq kernel: [<c016785d>] do_path_lookup+0x20a/0x225
Jun 20 10:54:37 hq kernel: [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
Jun 20 10:54:37 hq kernel: [<c0161b28>] cp_new_stat64+0xfd/0x10f
Jun 20 10:54:37 hq kernel: [<c016a9c1>] sys_select+0x9f/0x182
Jun 20 10:54:37 hq kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Why xfs routines are calling e1000 (isn't that a network driver???) is
very puzzling. This was using a Fibre Channel card (QLogic Corp. QLA2312
Fibre Channel Adapter); Would there be any benefit in repeating the same
thing again with the SATA device?
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 0:20 ` Stephen Rothwell
@ 2008-06-20 0:50 ` Brian May
2008-06-20 1:11 ` Brian May
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20 0:50 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linux-fsdevel
Stephen Rothwell wrote:
> Its what samba calls an oplock. Its like a kernel file lock but it
> blocks opens and notifies the holder (who is supposed to clean up and
> release the lease). If the lease is not released in 45 seconds (by
> default) then the kernel assumes that the holder is broken and breaks the
> lease and allows the open to proceed.
>
Ok, so I guess it is most likely Samba's fault then. Most likely this is
the only application that uses leases/oplocks on this computer anyway. I
will keep investigating...
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-20 0:07 ` Brian May
@ 2008-06-20 0:20 ` Stephen Rothwell
2008-06-20 0:50 ` Brian May
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2008-06-20 0:20 UTC (permalink / raw)
To: Brian May; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]
On Fri, 20 Jun 2008 10:07:26 +1000 Brian May <brian@vpac.org> wrote:
>
> Stephen Rothwell wrote:
> > So, it would seem some other process has taken write leases on your
> > files but does not release them in a timely manner. You can see the
> > current leases in /proc/locks.
> >
> What is a lease? Is this a kernel level thing or user level thing?
Its what samba calls an oplock. Its like a kernel file lock but it
blocks opens and notifies the holder (who is supposed to clean up and
release the lease). If the lease is not released in 45 seconds (by
default) then the kernel assumes that the holder is broken and breaks the
lease and allows the open to proceed.
> What is the format of /proc/locks? I assume the 4th field is PID, what
> is the next field? How do I work out what is locked?
the format is:
index: type status subtype pid major:minor:inode start end
for leases, type is LEASE status is ACTIVE, BREAKING or BREAKER, subtype
is UNLCK, READ or WRITE.
If there is a "->" infromt of the type, then the process described by
this line is blocked by the first line with the same index.
You can see the inode number of a file using "ls -i".
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 23:58 ` Stephen Rothwell
@ 2008-06-20 0:07 ` Brian May
2008-06-20 0:20 ` Stephen Rothwell
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20 0:07 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linux-fsdevel
Stephen Rothwell wrote:
> So, it would seem some other process has taken write leases on your
> files but does not release them in a timely manner. You can see the
> current leases in /proc/locks.
>
What is a lease? Is this a kernel level thing or user level thing?
What is the format of /proc/locks? I assume the 4th field is PID, what
is the next field? How do I work out what is locked?
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: open sleeps
2008-06-19 23:14 Brian May
@ 2008-06-19 23:58 ` Stephen Rothwell
2008-06-20 0:07 ` Brian May
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2008-06-19 23:58 UTC (permalink / raw)
To: Brian May; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 991 bytes --]
Hi Brian,
On Fri, 20 Jun 2008 09:14:46 +1000 Brian May <brian@vpac.org> wrote:
>
> I am having (weird) issues with XFS, in that open(...) on certain files
> takes 45 seconds to return. After the file has been opened, the next
> file in the same directory takes 45 seconds. If the file was recently
> opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in
> question to a completely independent RAID array (separate LVM, RAID,
> controllers, disks), but the problem remains.
>
> I raised the issue on the XFS mailing list and was told the problem was
> the the lease breaking code in VFS[2], and that I should contact this
> mailing list.
So, it would seem some other process has taken write leases on your
files but does not release them in a timely manner. You can see the
current leases in /proc/locks.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* open sleeps
@ 2008-06-19 23:14 Brian May
2008-06-19 23:58 ` Stephen Rothwell
0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-19 23:14 UTC (permalink / raw)
To: linux-fsdevel
Hello,
I am having (weird) issues with XFS, in that open(...) on certain files
takes 45 seconds to return. After the file has been opened, the next
file in the same directory takes 45 seconds. If the file was recently
opened it returns immediately.
I thought this was a low level I/O issue, so copied the files in
question to a completely independent RAID array (separate LVM, RAID,
controllers, disks), but the problem remains.
I raised the issue on the XFS mailing list and was told the problem was
the the lease breaking code in VFS[2], and that I should contact this
mailing list.
This is with the kernel in Debian/Stable:
hq:/tmp# uname -a
Linux hq 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux
More details:
[1] <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>
[2] <http://oss.sgi.com/archives/xfs/2008-06/msg00222.html>
Any ideas?
Please CC me responses.
Thanks.
Brian May
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-06-26 6:30 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-19 5:27 open sleeps Brian May
2008-06-19 5:30 ` Brian May
2008-06-19 6:21 ` Dave Chinner
2008-06-19 6:40 ` Brian May
2008-06-19 8:43 ` Christoph Hellwig
2008-06-19 9:48 ` Sebastian Brings
2008-06-20 1:47 ` Brian May
2008-06-20 4:11 ` Brian May
2008-06-26 6:29 ` Brian May
2008-06-19 23:14 Brian May
2008-06-19 23:58 ` Stephen Rothwell
2008-06-20 0:07 ` Brian May
2008-06-20 0:20 ` Stephen Rothwell
2008-06-20 0:50 ` Brian May
2008-06-20 1:11 ` Brian May
2008-06-20 6:45 ` Dave Chinner
2008-06-26 6:30 ` Brian May
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.