All of lore.kernel.org
 help / color / mirror / Atom feed
* open sleeps
@ 2008-06-19  5:27 Brian May
  2008-06-19  5:30 ` Brian May
  2008-06-19  6:21 ` Dave Chinner
  0 siblings, 2 replies; 17+ messages in thread
From: Brian May @ 2008-06-19  5:27 UTC (permalink / raw)
  To: xfs

Hello,

I am having (weird) issues with XFS, in that open(...) on certain files 
takes 45 seconds to return. After the file has been opened, the next 
file in the same directory takes 45 seconds. If the file was recently 
opened it returns immediately.

I thought this was a low level I/O issue, so copied the files in 
question to a completely independent RAID array  (separate LVM, RAID, 
controllers, disks), but the problem remains.

More details at thread starting from 
<http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.

Any ideas?

Please CC me responses.

Thanks.

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19  5:27 open sleeps Brian May
@ 2008-06-19  5:30 ` Brian May
  2008-06-19  6:21 ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-19  5:30 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

Sorry, repost with the weird Followup-To header (was hoping Thunderbird 
would use Mail-Followup-To but obviously not).

Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain 
> files takes 45 seconds to return. After the file has been opened, the 
> next file in the same directory takes 45 seconds. If the file was 
> recently opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in 
> question to a completely independent RAID array  (separate LVM, RAID, 
> controllers, disks), but the problem remains.
>
> More details at thread starting from 
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?
>
> Please CC me responses.
>
> Thanks.
>
> Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19  5:27 open sleeps Brian May
  2008-06-19  5:30 ` Brian May
@ 2008-06-19  6:21 ` Dave Chinner
  2008-06-19  6:40   ` Brian May
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-06-19  6:21 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
> Hello,
>
> I am having (weird) issues with XFS, in that open(...) on certain files  
> takes 45 seconds to return. After the file has been opened, the next  
> file in the same directory takes 45 seconds. If the file was recently  
> opened it returns immediately.
>
> I thought this was a low level I/O issue, so copied the files in  
> question to a completely independent RAID array  (separate LVM, RAID,  
> controllers, disks), but the problem remains.
>
> More details at thread starting from  
> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>
> Any ideas?

# echo t > /proc/sysrq-trigger

when it is hung to get a stack trace of the blocked open call.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19  6:21 ` Dave Chinner
@ 2008-06-19  6:40   ` Brian May
  2008-06-19  8:43     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-19  6:40 UTC (permalink / raw)
  To: Brian May, xfs

Dave Chinner wrote:
> On Thu, Jun 19, 2008 at 03:27:48PM +1000, Brian May wrote:
>   
>> Hello,
>>
>> I am having (weird) issues with XFS, in that open(...) on certain files  
>> takes 45 seconds to return. After the file has been opened, the next  
>> file in the same directory takes 45 seconds. If the file was recently  
>> opened it returns immediately.
>>
>> I thought this was a low level I/O issue, so copied the files in  
>> question to a completely independent RAID array  (separate LVM, RAID,  
>> controllers, disks), but the problem remains.
>>
>> More details at thread starting from  
>> <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>.
>>
>> Any ideas?
>>     
>
> # echo t > /proc/sysrq-trigger
>
> when it is hung to get a stack trace of the blocked open call.
>
> Cheers,
>
> Dave.
>   
Does the following help? I still have the logs of the other processes, 
if required (just in case it is some weird interaction between multiple 
processes?)

It seems to be pretty consistent with lock_timer_base, every time I look 
(assuming I haven't read the stack trace upside down...).

Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)

Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550 

Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286 

Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b 

Jun 19 16:33:30 hq kernel: Call Trace:

Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

Jun 19 16:33:30 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 19 16:33:30 hq kernel:  [<c0166450>] may_open+0x125/0x203

Jun 19 16:33:30 hq kernel:  [<c0168451>] open_namei+0x23d/0x5c8

Jun 19 16:33:30 hq kernel:  [<c0158cb8>] do_filp_open+0x1c/0x31

Jun 19 16:33:30 hq kernel:  [<c0161f38>] sys_stat64+0x1e/0x23

Jun 19 16:33:30 hq kernel:  [<c0158d0b>] do_sys_open+0x3e/0xb3

Jun 19 16:33:30 hq kernel:  [<c0158dad>] sys_open+0x16/0x18

Jun 19 16:33:30 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79



Jun 19 16:33:50 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)

Jun 19 16:33:50 hq kernel:        f0c23e7c 00200082 000a1089 00000000 00000010 00000008 cd0db550 dfa97550 

Jun 19 16:33:50 hq kernel:        34f84262 00273db2 0008a1dc 00000001 cd0db660 c20140a0 dfe1cbe8 00200286 

Jun 19 16:33:50 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 000000ff 00000000 00000000 a4dbf26b 

Jun 19 16:33:50 hq kernel: Call Trace:

Jun 19 16:33:50 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 19 16:33:50 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 19 16:33:50 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 19 16:33:50 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

Jun 19 16:33:50 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 19 16:33:50 hq kernel:  [<c0166450>] may_open+0x125/0x203

Jun 19 16:33:50 hq kernel:  [<c0168451>] open_namei+0x23d/0x5c8

Jun 19 16:33:50 hq kernel:  [<c0158cb8>] do_filp_open+0x1c/0x31

Jun 19 16:33:50 hq kernel:  [<c0161f38>] sys_stat64+0x1e/0x23

Jun 19 16:33:50 hq kernel:  [<c0158d0b>] do_sys_open+0x3e/0xb3

Jun 19 16:33:50 hq kernel:  [<c0158dad>] sys_open+0x16/0x18

Jun 19 16:33:50 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79


Thanks.

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19  6:40   ` Brian May
@ 2008-06-19  8:43     ` Christoph Hellwig
  2008-06-19  9:48       ` Sebastian Brings
  2008-06-20  1:47       ` Brian May
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2008-06-19  8:43 UTC (permalink / raw)
  To: Brian May; +Cc: xfs

On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> Does the following help? I still have the logs of the other processes, if 
> required (just in case it is some weird interaction between multiple 
> processes?)
>
> It seems to be pretty consistent with lock_timer_base, every time I look 
> (assuming I haven't read the stack trace upside down...).
>
> Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)
>
> Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 
> 00000010 00000008 cd0db550 dfa97550 
> Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 
> cd0db660 c20140a0 dfe1cbe8 00200286 
> Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 
> 000000ff 00000000 00000000 a4dbf26b 
> Jun 19 16:33:30 hq kernel: Call Trace:
>
> Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
>
> Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
>
> Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
>
> Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9

That's the lease breaking code in the VFS, long before we call
into XFS.  Looks like someone (samba?) has a least on this file and
we're having trouble having it broken.  Try sending a report about
this to linux-fsdevel@vger.kernel.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: open sleeps
  2008-06-19  8:43     ` Christoph Hellwig
@ 2008-06-19  9:48       ` Sebastian Brings
  2008-06-20  1:47       ` Brian May
  1 sibling, 0 replies; 17+ messages in thread
From: Sebastian Brings @ 2008-06-19  9:48 UTC (permalink / raw)
  To: xfs



> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com]
> On Behalf Of Christoph Hellwig
> Sent: Donnerstag, 19. Juni 2008 10:43
> To: Brian May
> Cc: xfs@oss.sgi.com
> Subject: Re: open sleeps
>
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
> > Does the following help? I still have the logs of the other
> processes, if
> > required (just in case it is some weird interaction between multiple
> > processes?)
> >
> > It seems to be pretty consistent with lock_timer_base,
> every time I look
> > (assuming I haven't read the stack trace upside down...).
> >
> > Jun 19 16:33:30 hq kernel: grep          S 00000000     0
> 12793  12567                     (NOTLB)
> >
> > Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082
> 000a1089 00000000
> > 00000010 00000008 cd0db550 dfa97550
> > Jun 19 16:33:30 hq kernel:        34f84262 00273db2
> 0008a1dc 00000001
> > cd0db660 c20140a0 dfe1cbe8 00200286
> > Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b
> dfa6a000 00200286
> > 000000ff 00000000 00000000 a4dbf26b
> > Jun 19 16:33:30 hq kernel: Call Trace:
> >
> > Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
> >
> > Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
> >
> > Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
> >
> > Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9
>
> That's the lease breaking code in the VFS, long before we call
> into XFS.  Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken.  Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>
>

For curiosity: Is the lease breaking you mention breaking a kernel oplock samba may hold?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19  8:43     ` Christoph Hellwig
  2008-06-19  9:48       ` Sebastian Brings
@ 2008-06-20  1:47       ` Brian May
  2008-06-20  4:11         ` Brian May
  1 sibling, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20  1:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Christoph Hellwig wrote:
> On Thu, Jun 19, 2008 at 04:40:00PM +1000, Brian May wrote:
>   
>> Does the following help? I still have the logs of the other processes, if 
>> required (just in case it is some weird interaction between multiple 
>> processes?)
>>
>> It seems to be pretty consistent with lock_timer_base, every time I look 
>> (assuming I haven't read the stack trace upside down...).
>>
>> Jun 19 16:33:30 hq kernel: grep          S 00000000     0 12793  12567                     (NOTLB)
>>
>> Jun 19 16:33:30 hq kernel:        f0c23e7c 00200082 000a1089 00000000 
>> 00000010 00000008 cd0db550 dfa97550 
>> Jun 19 16:33:30 hq kernel:        34f84262 00273db2 0008a1dc 00000001 
>> cd0db660 c20140a0 dfe1cbe8 00200286 
>> Jun 19 16:33:30 hq kernel:        c0125380 a4dbf26b dfa6a000 00200286 
>> 000000ff 00000000 00000000 a4dbf26b 
>> Jun 19 16:33:30 hq kernel: Call Trace:
>>
>> Jun 19 16:33:30 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
>>
>> Jun 19 16:33:30 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
>>
>> Jun 19 16:33:30 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
>>
>> Jun 19 16:33:30 hq kernel:  [<c016c801>] __break_lease+0x2a8/0x2b9
>>     
>
> That's the lease breaking code in the VFS, long before we call
> into XFS.  Looks like someone (samba?) has a least on this file and
> we're having trouble having it broken.  Try sending a report about
> this to linux-fsdevel@vger.kernel.org
>   
I feel I am going around in circles.

Anyway, I started the discussion from 
<http://www.archivum.info/linux-fsdevel@vger.kernel.org/2008-06/msg00337.html>.

In the last message (which isn't archived yet), I looked at the Samba 
process that is holding the lease. The following is the stack trace of 
this process. I don't understand why the XFS code is calling e1000 code, 
the filesystem isn't attached via the network. Perhaps this would mean 
the problem is with the network code???

Jun 20 10:54:37 hq kernel: smbd          S 00000000     0 13516  11112               13459 (NOTLB)

Jun 20 10:54:37 hq kernel:        ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0 


Jun 20 10:54:37 hq kernel:        ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286 


Jun 20 10:54:37 hq kernel:        c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b 


Jun 20 10:54:37 hq kernel: Call Trace:

Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4

Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]

Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf

Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0

Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632

Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135

Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b

Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d

Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9

Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd

Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]

Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225

Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]

Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f

Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182

Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79



I guess I also need to make sure I get this same stack trace each time.

Thanks.
Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  1:47       ` Brian May
@ 2008-06-20  4:11         ` Brian May
  2008-06-26  6:29           ` Brian May
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20  4:11 UTC (permalink / raw)
  To: Brian May; +Cc: Christoph Hellwig, xfs

Brian May wrote:
> I guess I also need to make sure I get this same stack trace each time.
Yes. Looks consistent to me.

Brian May.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  4:11         ` Brian May
@ 2008-06-26  6:29           ` Brian May
  0 siblings, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-26  6:29 UTC (permalink / raw)
  To: Brian May; +Cc: Christoph Hellwig, xfs

Hello,

Thanks for your help.

Just a followup: Every indication seems to be that it is the firewall in 
the virus scanner we use, F-Secure, that prevents Windows from releasing 
oplocks properly. After turning off the firewall, the oplocks get 
released and there is no problem. Possibly related to the timing of wpkg 
being invoked before system logons too.

Might be time to consider another virus scanner/firewall solution.

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  6:45           ` Dave Chinner
@ 2008-06-26  6:30             ` Brian May
  0 siblings, 0 replies; 17+ messages in thread
From: Brian May @ 2008-06-26  6:30 UTC (permalink / raw)
  To: Brian May, Stephen Rothwell, linux-fsdevel

Dave Chinner wrote:
> This is just the stupid (broken) ia32 stack tracing - it outputs every
> single object it sees in the stack that may be function call. Hence
> you see some stuff that is from past stack traversals. I'd say this
> is sleeping in a select() call, not doing anything related to
> xfs or networking....
>   
Hello,

Thanks for your help.

Just a followup: Every indication seems to be that it is the firewall in 
the virus scanner we use, F-Secure, that prevents Windows from releasing 
oplocks properly. After turning off the firewall, the oplocks get 
released and there is no problem. Possibly related to the timing of wpkg 
being invoked before system logons too.

Might be time to consider another virus scanner/firewall solution.

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  1:11         ` Brian May
@ 2008-06-20  6:45           ` Dave Chinner
  2008-06-26  6:30             ` Brian May
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-06-20  6:45 UTC (permalink / raw)
  To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel

On Fri, Jun 20, 2008 at 11:11:54AM +1000, Brian May wrote:
> Brian May wrote:
>> Stephen Rothwell wrote:
>>> Its what samba calls an oplock.  Its like a kernel file lock but it
>>> blocks opens and notifies the holder (who is supposed to clean up and
>>> release the lease).  If the lease is not released in 45 seconds (by
>>> default) then the kernel assumes that the holder is broken and breaks 
>>> the
>>> lease and allows the open to proceed.
>>>   
>> Ok, so I guess it is most likely Samba's fault then. Most likely this  
>> is the only application that uses leases/oplocks on this computer  
>> anyway. I will keep investigating...
> Maybe I spoke too soon when I ruled XFS out :-(.
....
> hq:~# echo t > /proc/sysrq-trigger
> Gives a stack trace of:
>
> Jun 20 10:54:37 hq kernel: Call Trace:
>
> Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f
> Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c
> Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5
> Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4
> Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2
> Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc
> Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]
> Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf
> Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0
> Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]
> Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632
> Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758
> Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb
> Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135
> Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b
> Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d
> Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9
> Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd
> Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]
> Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225
> Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]
> Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f
> Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182
> Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79

This is just the stupid (broken) ia32 stack tracing - it outputs every
single object it sees in the stack that may be function call. Hence
you see some stuff that is from past stack traversals. I'd say this
is sleeping in a select() call, not doing anything related to
xfs or networking....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  0:50       ` Brian May
@ 2008-06-20  1:11         ` Brian May
  2008-06-20  6:45           ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20  1:11 UTC (permalink / raw)
  To: Brian May; +Cc: Stephen Rothwell, linux-fsdevel

Brian May wrote:
> Stephen Rothwell wrote:
>> Its what samba calls an oplock.  Its like a kernel file lock but it
>> blocks opens and notifies the holder (who is supposed to clean up and
>> release the lease).  If the lease is not released in 45 seconds (by
>> default) then the kernel assumes that the holder is broken and breaks 
>> the
>> lease and allows the open to proceed.
>>   
> Ok, so I guess it is most likely Samba's fault then. Most likely this 
> is the only application that uses leases/oplocks on this computer 
> anyway. I will keep investigating...
Maybe I spoke too soon when I ruled XFS out :-(.


hq:~# cat /proc/locks | grep :0a:
5: LEASE  ACTIVE    WRITE 13516 fd:0a:131 0 EOF

hq:~# ps  13516
  PID TTY      STAT   TIME COMMAND
13516 ?        S      0:00 /usr/sbin/smbd -D

hq:~# find /var/lib/wpkg -inum 131
/var/lib/wpkg/cp_status.cmd

hq:~# cat /var/lib/wpkg/cp_status.cmd
[Ctrl-C pushed]

hq:~# cat /proc/locks | grep :0a:
5: LEASE  BREAKING  READ  13516 fd:0a:131 0 EOF

hq:~# echo t > /proc/sysrq-trigger


Gives a stack trace of:

Jun 20 10:54:37 hq kernel: smbd          S 00000000     0 13516  11112               13459 (NOTLB)

Jun 20 10:54:37 hq kernel:        ddd19b70 00000082 034cdfca 00000000 00000001 00000007 f7c2c550 dfa9caa0 

Jun 20 10:54:37 hq kernel:        ae402975 002779a9 0000830f 00000003 f7c2c660 c20240a0 00000001 00000286 

Jun 20 10:54:37 hq kernel:        c0125380 a5d7f11b c2116000 00000286 000000ff 00000000 00000000 a5d7f11b 

Jun 20 10:54:37 hq kernel: Call Trace:

Jun 20 10:54:37 hq kernel:  [<c0125380>] lock_timer_base+0x15/0x2f

Jun 20 10:54:37 hq kernel:  [<c027f960>] schedule_timeout+0x71/0x8c

Jun 20 10:54:37 hq kernel:  [<c0124a81>] process_timeout+0x0/0x5

Jun 20 10:54:37 hq kernel:  [<c016a115>] do_select+0x37a/0x3d4

Jun 20 10:54:37 hq kernel:  [<c016a677>] __pollwait+0x0/0xb2

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<c0117778>] default_wake_function+0x0/0xc

Jun 20 10:54:37 hq kernel:  [<f88e998f>] e1000_xmit_frame+0x928/0x958 [e1000]

Jun 20 10:54:37 hq kernel:  [<c0121c24>] tasklet_action+0x55/0xaf

Jun 20 10:54:37 hq kernel:  [<c022950a>] dev_hard_start_xmit+0x19a/0x1f0

Jun 20 10:54:37 hq kernel:  [<f8ae3e6d>] xfs_iext_bno_to_ext+0xd8/0x191 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7aec>] xfs_bmap_search_multi_extents+0xa8/0xc5 [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7b52>] xfs_bmap_search_extents+0x49/0xbe [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<f8ac7e35>] xfs_bmapi+0x26e/0x20ce [xfs]

Jun 20 10:54:37 hq kernel:  [<c02547e4>] tcp_transmit_skb+0x604/0x632

Jun 20 10:54:37 hq kernel:  [<c02560d3>] __tcp_push_pending_frames+0x6a2/0x758

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c016d84e>] __d_lookup+0x98/0xdb

Jun 20 10:54:37 hq kernel:  [<c0165370>] do_lookup+0x4f/0x135

Jun 20 10:54:37 hq kernel:  [<c016dbc4>] dput+0x1a/0x11b

Jun 20 10:54:37 hq kernel:  [<c0167312>] __link_path_walk+0xbe4/0xd1d

Jun 20 10:54:37 hq kernel:  [<c016a3fb>] core_sys_select+0x28c/0x2a9

Jun 20 10:54:37 hq kernel:  [<c01674fe>] link_path_walk+0xb3/0xbd

Jun 20 10:54:37 hq kernel:  [<f8afbea1>] xfs_inactive_free_eofblocks+0xdf/0x23f [xfs]

Jun 20 10:54:37 hq kernel:  [<c016785d>] do_path_lookup+0x20a/0x225

Jun 20 10:54:37 hq kernel:  [<f8b07de5>] xfs_vn_getattr+0x27/0x2f [xfs]

Jun 20 10:54:37 hq kernel:  [<c0161b28>] cp_new_stat64+0xfd/0x10f

Jun 20 10:54:37 hq kernel:  [<c016a9c1>] sys_select+0x9f/0x182

Jun 20 10:54:37 hq kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79


Why xfs routines are calling e1000 (isn't that a network driver???) is 
very puzzling. This was using a Fibre Channel card (QLogic Corp. QLA2312 
Fibre Channel Adapter); Would there be any benefit in repeating the same 
thing again with the SATA device?

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  0:20     ` Stephen Rothwell
@ 2008-06-20  0:50       ` Brian May
  2008-06-20  1:11         ` Brian May
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20  0:50 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-fsdevel

Stephen Rothwell wrote:
> Its what samba calls an oplock.  Its like a kernel file lock but it
> blocks opens and notifies the holder (who is supposed to clean up and
> release the lease).  If the lease is not released in 45 seconds (by
> default) then the kernel assumes that the holder is broken and breaks the
> lease and allows the open to proceed.
>   
Ok, so I guess it is most likely Samba's fault then. Most likely this is 
the only application that uses leases/oplocks on this computer anyway. I 
will keep investigating...

Thanks.

Brian May



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-20  0:07   ` Brian May
@ 2008-06-20  0:20     ` Stephen Rothwell
  2008-06-20  0:50       ` Brian May
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2008-06-20  0:20 UTC (permalink / raw)
  To: Brian May; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On Fri, 20 Jun 2008 10:07:26 +1000 Brian May <brian@vpac.org> wrote:
>
> Stephen Rothwell wrote:
> > So, it would seem some other process has taken write leases on your
> > files but does not release them in a timely manner.  You can see the
> > current leases in /proc/locks.
> >   
> What is a lease? Is this a kernel level thing or user level thing?

Its what samba calls an oplock.  Its like a kernel file lock but it
blocks opens and notifies the holder (who is supposed to clean up and
release the lease).  If the lease is not released in 45 seconds (by
default) then the kernel assumes that the holder is broken and breaks the
lease and allows the open to proceed.

> What is the format of /proc/locks? I assume the 4th field is PID, what 
> is the next field? How do I work out what is locked?

the format is:

index: type status subtype pid major:minor:inode start end

for leases, type is LEASE status is ACTIVE, BREAKING or BREAKER, subtype
is UNLCK, READ or WRITE.

If there is a "->" infromt of the type, then the process described by
this line is blocked by the first line with the same index.

You can see the inode number of a file using "ls -i".

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19 23:58 ` Stephen Rothwell
@ 2008-06-20  0:07   ` Brian May
  2008-06-20  0:20     ` Stephen Rothwell
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-20  0:07 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-fsdevel

Stephen Rothwell wrote:
> So, it would seem some other process has taken write leases on your
> files but does not release them in a timely manner.  You can see the
> current leases in /proc/locks.
>   
What is a lease? Is this a kernel level thing or user level thing?

What is the format of /proc/locks? I assume the 4th field is PID, what 
is the next field? How do I work out what is locked?

Thanks.

Brian May

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: open sleeps
  2008-06-19 23:14 Brian May
@ 2008-06-19 23:58 ` Stephen Rothwell
  2008-06-20  0:07   ` Brian May
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2008-06-19 23:58 UTC (permalink / raw)
  To: Brian May; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

Hi Brian,

On Fri, 20 Jun 2008 09:14:46 +1000 Brian May <brian@vpac.org> wrote:
>
> I am having (weird) issues with XFS, in that open(...) on certain files 
> takes 45 seconds to return. After the file has been opened, the next 
> file in the same directory takes 45 seconds. If the file was recently 
> opened it returns immediately.
> 
> I thought this was a low level I/O issue, so copied the files in 
> question to a completely independent RAID array (separate LVM, RAID, 
> controllers, disks), but the problem remains.
> 
> I raised the issue on the XFS mailing list and was told the problem was 
> the the lease breaking code in VFS[2], and that I should contact this 
> mailing list.

So, it would seem some other process has taken write leases on your
files but does not release them in a timely manner.  You can see the
current leases in /proc/locks.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* open sleeps
@ 2008-06-19 23:14 Brian May
  2008-06-19 23:58 ` Stephen Rothwell
  0 siblings, 1 reply; 17+ messages in thread
From: Brian May @ 2008-06-19 23:14 UTC (permalink / raw)
  To: linux-fsdevel

Hello,

I am having (weird) issues with XFS, in that open(...) on certain files 
takes 45 seconds to return. After the file has been opened, the next 
file in the same directory takes 45 seconds. If the file was recently 
opened it returns immediately.

I thought this was a low level I/O issue, so copied the files in 
question to a completely independent RAID array (separate LVM, RAID, 
controllers, disks), but the problem remains.

I raised the issue on the XFS mailing list and was told the problem was 
the the lease breaking code in VFS[2], and that I should contact this 
mailing list.

This is with the kernel in Debian/Stable:

hq:/tmp# uname -a
Linux hq 2.6.18-6-686 #1 SMP Sun Feb 10 22:11:31 UTC 2008 i686 GNU/Linux

More details:
[1] <http://lists.luv.asn.au/wws/arc/luv-main/2008-06/msg00143.html>
[2] <http://oss.sgi.com/archives/xfs/2008-06/msg00222.html>

Any ideas?

Please CC me responses.

Thanks.

Brian May


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-06-26  6:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-19  5:27 open sleeps Brian May
2008-06-19  5:30 ` Brian May
2008-06-19  6:21 ` Dave Chinner
2008-06-19  6:40   ` Brian May
2008-06-19  8:43     ` Christoph Hellwig
2008-06-19  9:48       ` Sebastian Brings
2008-06-20  1:47       ` Brian May
2008-06-20  4:11         ` Brian May
2008-06-26  6:29           ` Brian May
2008-06-19 23:14 Brian May
2008-06-19 23:58 ` Stephen Rothwell
2008-06-20  0:07   ` Brian May
2008-06-20  0:20     ` Stephen Rothwell
2008-06-20  0:50       ` Brian May
2008-06-20  1:11         ` Brian May
2008-06-20  6:45           ` Dave Chinner
2008-06-26  6:30             ` Brian May

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.