linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.19+trond and diskless locking problems
@ 2002-10-03 21:44 Christian Reis
  2002-10-03 22:47 ` Trond Myklebust
  2002-10-04  7:07 ` [NFS] " Juergen Hasch
  0 siblings, 2 replies; 17+ messages in thread
From: Christian Reis @ 2002-10-03 21:44 UTC (permalink / raw)
  To: NFS; +Cc: linux-kernel

Hey there,

We've got a network with about 20 diskless boxes at a client office.
They've been running 2.4.19 (both the knfsd server and the clients) with
Trond's patches for a while now, and though performance is really nice,
occasionally one box or another will end up with a strange locking
problem. 

At bootup, and at shutdown, and for certain other tasks (reading utmp,
etc) the box hangs for a long while (during which I suspect it is trying
to lock). It hangs for about 300 seconds and then goes on normally.

When this happens, there is always a file left in /var/lib/nfs/sm
(normally there are no files in there for none of the clients, even when
they are on) for the hanging box. Is this normal?

We also occasionally get a log message in the server for this box like:

    kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor 192.168.99.7

Trond, can I get you more troubleshooting information, or should I try
2.4.20-pre on server *and* clients? This is a bit wierd, but since I
don't know a lot of what went on in the last changes, I'm not sure where
to start looking.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-10-03 21:44 2.4.19+trond and diskless locking problems Christian Reis
@ 2002-10-03 22:47 ` Trond Myklebust
  2002-10-03 23:26   ` Christian Reis
  2002-10-04  7:07 ` [NFS] " Juergen Hasch
  1 sibling, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2002-10-03 22:47 UTC (permalink / raw)
  To: Christian Reis; +Cc: NFS, linux-kernel

>>>>> " " == Christian Reis <kiko@async.com.br> writes:

     > When this happens, there is always a file left in
     > /var/lib/nfs/sm (normally there are no files in there for none
     > of the clients, even when they are on) for the hanging box. Is
     > this normal?

It means that rpc.statd did not manage to unmonitor the NFS locks
before it shut down. The reasons for this could be multiple, such as
for instance if the client crashed and/or rebooted. It might also
indicate that the server could not be contacted in order to unmonitor
the lock.

     > We also occasionally get a log message in the server for this
     > box like:

     >     kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
     >     192.168.99.7

Means that the kernel was unable to contact rpc.statd, or that was
unable to contact the server's rpc.statd for some reason.

     > Trond, can I get you more troubleshooting information, or
     > should I try 2.4.20-pre on server *and* clients? This is a bit
     > wierd, but since I don't know a lot of what went on in the last
     > changes, I'm not sure where to start looking.

There is nothing in the above to suggest that this must be a kernel
problem. The fact that you are seeing files in /var/lib/nfs/sm
in these cases rather suggests that the problem lies with rpc.statd.
Can you see any reason in your setup why it should be failing?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-10-03 22:47 ` Trond Myklebust
@ 2002-10-03 23:26   ` Christian Reis
  2002-10-04  1:13     ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-10-03 23:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, linux-kernel

On Fri, Oct 04, 2002 at 12:47:38AM +0200, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> 
>      > When this happens, there is always a file left in
>      > /var/lib/nfs/sm (normally there are no files in there for none
>      > of the clients, even when they are on) for the hanging box. Is
>      > this normal?
> 
> It means that rpc.statd did not manage to unmonitor the NFS locks
> before it shut down. The reasons for this could be multiple, such as
> for instance if the client crashed and/or rebooted. It might also
> indicate that the server could not be contacted in order to unmonitor
> the lock.

Yes, these hangs only happen with boxes that crash frequently (they
crash because of a wierd and unrelated bug in X, which forces a reboot
usually). 

>      >     kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
>      >     192.168.99.7
> 
> Means that the kernel was unable to contact rpc.statd, or that was
> unable to contact the server's rpc.statd for some reason.

Hmmm, I wonder if I understand properly. Is the following flow correct
for the RPC request?

    Client Kernel -> Client rpc.statd -> Server rpc.statd -> Server Kernel

> 
> There is nothing in the above to suggest that this must be a kernel
> problem. The fact that you are seeing files in /var/lib/nfs/sm
> in these cases rather suggests that the problem lies with rpc.statd.
> Can you see any reason in your setup why it should be failing?

Not really. The clients run rpc.statd 1.0 and the server, 1.0.1. Should
I start gdbing it to see what is going wrong?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-10-03 23:26   ` Christian Reis
@ 2002-10-04  1:13     ` Trond Myklebust
  2002-11-20 14:02       ` Christian Reis
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2002-10-04  1:13 UTC (permalink / raw)
  To: Christian Reis; +Cc: Trond Myklebust, NFS, linux-kernel

>>>>> " " == Christian Reis <kiko@async.com.br> writes:

    >> >     kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot
    >> >     monitor 192.168.99.7
    >>
    >> Means that the kernel was unable to contact rpc.statd, or that
    >> was unable to contact the server's rpc.statd for some reason.

     > Hmmm, I wonder if I understand properly. Is the following flow
     > correct for the RPC request?

     >     Client Kernel -> Client rpc.statd -> Server rpc.statd ->
     >     Server Kernel

That's more or less right, except that the communication is bidirectional.

    >> lies with rpc.statd.  Can you see any reason in your setup why
    >> it should be failing?

     > Not really. The clients run rpc.statd 1.0 and the server,
     > 1.0.1. Should I start gdbing it to see what is going wrong?

Start by using tcpdump to find out who, in the above chain, is taking
such a long time to respond.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] 2.4.19+trond and diskless locking problems
  2002-10-03 21:44 2.4.19+trond and diskless locking problems Christian Reis
  2002-10-03 22:47 ` Trond Myklebust
@ 2002-10-04  7:07 ` Juergen Hasch
  2002-10-04 13:17   ` Christian Reis
  1 sibling, 1 reply; 17+ messages in thread
From: Juergen Hasch @ 2002-10-04  7:07 UTC (permalink / raw)
  To: Christian Reis, NFS; +Cc: linux-kernel

Am Donnerstag, 3. Oktober 2002 23:44 schrieb Christian Reis:
>
> We also occasionally get a log message in the server for this box like:
>
>     kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
> 192.168.99.7

I got the same messages when mounting an AIX client to a Linux server after 
upgrading to 2.4.19 Kernel.
After installing the latest NFS utils, the problem went away.
So I guess Trond is right, try looking at the userspace utilities.

...Juergen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] 2.4.19+trond and diskless locking problems
  2002-10-04  7:07 ` [NFS] " Juergen Hasch
@ 2002-10-04 13:17   ` Christian Reis
  2002-10-04 16:00     ` Juergen Hasch
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-10-04 13:17 UTC (permalink / raw)
  To: Juergen Hasch; +Cc: NFS, linux-kernel

On Fri, Oct 04, 2002 at 09:07:47AM +0200, Juergen Hasch wrote:
> I got the same messages when mounting an AIX client to a Linux server after 
> upgrading to 2.4.19 Kernel.
> After installing the latest NFS utils, the problem went away.

Does this mean nfs-utils-1.0.1 vs 1.0, or were you using a much older
version?

> So I guess Trond is right, try looking at the userspace utilities.

I will, thanks.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [NFS] 2.4.19+trond and diskless locking problems
  2002-10-04 13:17   ` Christian Reis
@ 2002-10-04 16:00     ` Juergen Hasch
  0 siblings, 0 replies; 17+ messages in thread
From: Juergen Hasch @ 2002-10-04 16:00 UTC (permalink / raw)
  To: Christian Reis; +Cc: NFS, linux-kernel

Am Freitag, 4. Oktober 2002 15:17 schrieb Christian Reis:
> On Fri, Oct 04, 2002 at 09:07:47AM +0200, Juergen Hasch wrote:
> > I got the same messages when mounting an AIX client to a Linux server
> > after upgrading to 2.4.19 Kernel.
> > After installing the latest NFS utils, the problem went away.
>
> Does this mean nfs-utils-1.0.1 vs 1.0, or were you using a much older
> version?

Versions 0.3.3 and 1.0.1 are working fine for me, so it looks like
your problem is different.

...Juergen


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-10-04  1:13     ` Trond Myklebust
@ 2002-11-20 14:02       ` Christian Reis
  2002-11-20 17:02         ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-11-20 14:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, linux-kernel

On Fri, Oct 04, 2002 at 03:13:14AM +0200, Trond Myklebust wrote:
> That's more or less right, except that the communication is bidirectional.
> 
>     >> lies with rpc.statd.  Can you see any reason in your setup why
>     >> it should be failing?
> 
>      > Not really. The clients run rpc.statd 1.0 and the server,
>      > 1.0.1. Should I start gdbing it to see what is going wrong?
> 
> Start by using tcpdump to find out who, in the above chain, is taking
> such a long time to respond.

I haven't forgotten this. It's just that I've been unable to test: the
problem just stopped showing up when I upgraded to 2.4.20-pre11 with
your NFS-ALL patches applied to it. Could something have changed, or are
we just lucky?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-20 14:02       ` Christian Reis
@ 2002-11-20 17:02         ` Trond Myklebust
  2002-11-27  0:41           ` Christian Reis
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2002-11-20 17:02 UTC (permalink / raw)
  To: Christian Reis; +Cc: NFS, linux-kernel

>>>>> " " == Christian Reis <kiko@async.com.br> writes:

     > I haven't forgotten this. It's just that I've been unable to
     > test: the problem just stopped showing up when I upgraded to
     > 2.4.20-pre11 with your NFS-ALL patches applied to it. Could
     > something have changed, or are we just lucky?

The main changes have been the discovery of a couple of kmap()
imbalances. Those are also fixed in 2.4.20-rc2.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-20 17:02         ` Trond Myklebust
@ 2002-11-27  0:41           ` Christian Reis
  2002-11-27  1:14             ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-11-27  0:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, linux-kernel

On Wed, Nov 20, 2002 at 06:02:34PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> 
>      > I haven't forgotten this. It's just that I've been unable to
>      > test: the problem just stopped showing up when I upgraded to
>      > 2.4.20-pre11 with your NFS-ALL patches applied to it. Could
>      > something have changed, or are we just lucky?
> 
> The main changes have been the discovery of a couple of kmap()
> imbalances. Those are also fixed in 2.4.20-rc2.

Just because I send that email my server decided it is going to act up
again. So this time I sat down, and looked hard at the logs and tcpdump
output and this is what I *think*: a lockd bug. Here's the symptoms:

(recalling, a 100mbps network on a d-link switch with about 10 diskless
clients. Some of the boxes start getting really slow shutdown and
startup, and the server seems to be unaffected. Clients run 2.4.19 with
trond's-ALL patches, server runs 2.4.20-pre11 with trond's-ALL patches.
everything *should* be tcp-mounted, but I'm not 100% sure about the root
filesystem)

Trond, I've attached quite a bit of debug output below, but I'm a bit
lost as to what it could be now. Do you think I should start suspecting
the hardware, now? I have *no* reason to do so, but if nobody else sees
this sort of issue...

a) ps ax | grep lockd returns:

   94 ?        DW     0:00 [lockd]

    Why would lockd be in state "D"? Looks bad. Can this happen in
    normal usage? It kicks the loadavg up 1 point.

b) tcpdump output, as seen by the server, during bootup (server is
anthem, violinux is client):

    [seemingly normal nfs operation up to here]

    09:24:08.876333 anthem.async.com.br.nfs > violinux.async.com.br.2153135585: reply ok 128 lookup [|nfs] (DF)
    09:24:08.876464 violinux.async.com.br.2169912801 > anthem.async.com.br.nfs: 132 setattr [|nfs] (DF)
    09:24:08.876490 anthem.async.com.br.nfs > violinux.async.com.br.2169912801: reply ok 96 setattr [|nfs] (DF)

    [ 10-second delay here ]

    09:24:18.988289 violinux.async.com.br.793 > anthem.async.com.br.sometimes-rpc4: udp 180 (DF)

    [ 11-second delay here ]

    09:24:29.889685 violinux.async.com.br.2891398625 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
    09:24:29.889864 anthem.async.com.br.nfs > violinux.async.com.br.2891398625: reply ok 128 lookup [|nfs] (DF)
    09:24:29.890121 violinux.async.com.br.2908175841 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
    09:24:29.890245 anthem.async.com.br.nfs > violinux.async.com.br.2908175841: reply ok 720 read [|nfs] (DF)
    09:24:29.890654 violinux.async.com.br.2924953057 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)

    [ back to business as usual ]

This repeats itself a number of times, and these 20-second combined
hangs take their toll - it's 10 minutes already and no bootup. The
slowness does *not* manifest itself until we move into runlevel 3 of the
bootup process.

    sometimes-rpc4 is port 32770, which rpcinfo -p shows us to be
        nlockmgr.
    nfs is port 2049.

c) netstat -apn just so we know what ports are in use (state LISTEN for
all but port 32770, which has no state listed):

    Proto Recv-Q Send-Q Local Address  Foreign Address PID/Program name

    tcp        0      0 0.0.0.0:2049   0.0.0.0:*       -                  
    udp    31752      0 0.0.0.0:32770  0.0.0.0:*       - 
    udp        0      0 0.0.0.0:32768  0.0.0.0:*       88/rpc.statd     
    udp        0      0 0.0.0.0:32769  0.0.0.0:*       90/rpc.mountd      

d) rpcinfo -p

       program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100007    2   udp    680  ypbind
    100007    1   udp    680  ypbind
    100007    2   tcp    683  ypbind
    100007    1   tcp    683  ypbind
    100004    2   udp    685  ypserv
    100004    1   udp    685  ypserv
    100004    2   tcp    688  ypserv
    100004    1   tcp    688  ypserv
    100009    1   udp    687  yppasswdd
    100024    1   udp  32768  status
    100024    1   tcp  32768  status
    100005    1   udp  32769  mountd
    100005    1   tcp  32769  mountd
    100005    2   udp  32769  mountd
    100005    2   tcp  32769  mountd
    100005    3   udp  32769  mountd
    100005    3   tcp  32769  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100021    1   udp  32770  nlockmgr
    100021    3   udp  32770  nlockmgr
    100021    4   udp  32770  nlockmgr
    100021    1   tcp  32770  nlockmgr
    100021    3   tcp  32770  nlockmgr
    100021    4   tcp  32770  nlockmgr

So, am I right in thinking it's a lockd problem? rpc.statd seems to be
okay, but I could be wrong..

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-27  0:41           ` Christian Reis
@ 2002-11-27  1:14             ` Trond Myklebust
  2002-11-27  1:44               ` Christian Reis
  2002-11-27 17:08               ` Christian Reis
  0 siblings, 2 replies; 17+ messages in thread
From: Trond Myklebust @ 2002-11-27  1:14 UTC (permalink / raw)
  To: Christian Reis; +Cc: Trond Myklebust, NFS, linux-kernel

>>>>> " " == Christian Reis <kiko@async.com.br> writes:

     > a) ps ax | grep lockd returns:

     >    94 ?  DW 0:00 [lockd]

     >     Why would lockd be in state "D"? Looks bad. Can this happen
     >     in normal usage? It kicks the loadavg up 1 point.
<snip>
     >     [ 10-second delay here ]

     >     09:24:18.988289 violinux.async.com.br.793 >
     >     anthem.async.com.br.sometimes-rpc4: udp 180 (DF)

     >     [ 11-second delay here ]

OK, so you are sending out the RPC request to the server's NLM daemon,
which is clearly receiving the packet (since tcpdump was able to log
it), but is never sending a reply. Are you seeing any kernel messages
in the syslog?

BTW: the tcpdumps you're showing are all UDP, not TCP...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-27  1:14             ` Trond Myklebust
@ 2002-11-27  1:44               ` Christian Reis
  2002-11-27 17:08               ` Christian Reis
  1 sibling, 0 replies; 17+ messages in thread
From: Christian Reis @ 2002-11-27  1:44 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, linux-kernel

On Wed, Nov 27, 2002 at 02:14:50AM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> 
>      > a) ps ax | grep lockd returns:
> 
>      >    94 ?  DW 0:00 [lockd]
> 
>      >     Why would lockd be in state "D"? Looks bad. Can this happen
>      >     in normal usage? It kicks the loadavg up 1 point.
> <snip>
>      >     [ 10-second delay here ]
> 
>      >     09:24:18.988289 violinux.async.com.br.793 >
>      >     anthem.async.com.br.sometimes-rpc4: udp 180 (DF)
> 
>      >     [ 11-second delay here ]
> 
> OK, so you are sending out the RPC request to the server's NLM daemon,
> which is clearly receiving the packet (since tcpdump was able to log
> it), but is never sending a reply. Are you seeing any kernel messages
> in the syslog?

No. The kernel log (kern.* in syslog) is very quite during this period -
the only messenges I get are the eth0 promiscuous mode messages that
tcpdump triggers when it runs.

My messenges file just lists the mounts:

Nov 22 09:24:04 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:800 for /
export/root (/export) Nov 22 09:24:06 anthem dhcpd: DHCPDISCOVER from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPOFFER on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPDISCOVER from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPOFFER on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPREQUEST for 192.168.99.2 (192.168.99.4) from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPACK on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:707 for /home (/home) 
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:711 for /mondo (/mondo) 
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:715 for /dist (/dist) 
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:719 for /var/spool/mail (/var/spool/mail) 
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from
violinux.async.com.br:712 for /export/root/var/spool/violinux (/export) 
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:714 for /export/root/var/log/violinux (/export) 
Nov 22 09:27:31 anthem named[141]: "lab.16.106.143.in-addr.arpa IN NS"
points to a CNAME (grande.ic .unicamp.br)

(yep, unrelated last line just to show nothing is going on)

> BTW: the tcpdumps you're showing are all UDP, not TCP...

100% true, as I noticed as I pressed "y" to send. :-)

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-27  1:14             ` Trond Myklebust
  2002-11-27  1:44               ` Christian Reis
@ 2002-11-27 17:08               ` Christian Reis
  2002-11-27 20:31                 ` Trond Myklebust
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-11-27 17:08 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS, linux-kernel

On Wed, Nov 27, 2002 at 02:14:50AM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> 
>      > a) ps ax | grep lockd returns:
> 
>      >    94 ?  DW 0:00 [lockd]
> 
>      >     Why would lockd be in state "D"? Looks bad. Can this happen
>      >     in normal usage? It kicks the loadavg up 1 point.
> <snip>
>      >     [ 10-second delay here ]
> 
>      >     09:24:18.988289 violinux.async.com.br.793 >
>      >     anthem.async.com.br.sometimes-rpc4: udp 180 (DF)
> 
>      >     [ 11-second delay here ]
> 
> OK, so you are sending out the RPC request to the server's NLM daemon,
> which is clearly receiving the packet (since tcpdump was able to log
> it), but is never sending a reply. Are you seeing any kernel messages
> in the syslog?

Hmmm. Ran into this again this morning and right now when starting up
one of the boxes. A reboot of the workstation fixed it for the while.

What can I do to help further debug the issue? Try another kernel
version on clients? Server? This is giving me a headache.. :-/

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-27 17:08               ` Christian Reis
@ 2002-11-27 20:31                 ` Trond Myklebust
  2002-11-29  1:34                   ` Christian Reis
       [not found]                   ` <20021128232911.G18175@blackjesus.async.com.br>
  0 siblings, 2 replies; 17+ messages in thread
From: Trond Myklebust @ 2002-11-27 20:31 UTC (permalink / raw)
  To: Christian Reis; +Cc: NFS, linux-kernel

>>>>> " " == Christian Reis <kiko@async.com.br> writes:

     > What can I do to help further debug the issue? Try another
     > kernel version on clients? Server? This is giving me a
     > headache.. :-/

Try to give us a dump of the server lock manager activity when this
problem occurs. Try to do

echo "217" >/proc/sys/sunrpc/nlm_debug

on the server just before the client issues a lock.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-27 20:31                 ` Trond Myklebust
@ 2002-11-29  1:34                   ` Christian Reis
  2002-12-02 14:51                     ` Christian Reis
       [not found]                   ` <20021128232911.G18175@blackjesus.async.com.br>
  1 sibling, 1 reply; 17+ messages in thread
From: Christian Reis @ 2002-11-29  1:34 UTC (permalink / raw)
  To: NFS, linux-kernel

On Wed, Nov 27, 2002 at 09:31:42PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> 
>      > What can I do to help further debug the issue? Try another
>      > kernel version on clients? Server? This is giving me a
>      > headache.. :-/
> 
> Try to give us a dump of the server lock manager activity when this
> problem occurs. Try to do
> 
> echo "217" >/proc/sys/sunrpc/nlm_debug

Okay, I've just done that. Just completing information, this server uses
ReiserFS filesystems, and I know we've had issues with knfsd and reiser
before, so I guess it's at least relevant to say. 

* The tcpdump output first for the first set of hangs (hangs start 15:37:15)

  [ Following a million nfs packets in the same second (:15): ]

15:37:15.606342 banzai.async.com.br.2337883237 > anthem.async.com.br.nfs: 112 lookup [|nfs] (DF)
15:37:15.606365 anthem.async.com.br.nfs > banzai.async.com.br.2337883237: reply ok 120 lookup [|nfs] (DF)
15:37:15.607877 banzai.async.com.br.792 > anthem.async.com.br.sunrpc: udp 56 (DF)
15:37:15.608384 anthem.async.com.br.sunrpc > banzai.async.com.br.792: udp 28 (DF)
15:37:15.608583 banzai.async.com.br.793 > anthem.async.com.br.sometimes-rpc4: udp 176 (DF)
15:37:15.609593 anthem.async.com.br.sometimes-rpc4 > banzai.async.com.br.793: udp 36 (DF)

  [ 3s here ]

15:37:18.032896 banzai.async.com.br.2421769317 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
15:37:18.032975 anthem.async.com.br.nfs > banzai.async.com.br.2421769317: reply ok 128 lookup [|nfs] (DF)
15:37:18.033392 banzai.async.com.br.2438546533 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
15:37:18.033449 anthem.async.com.br.nfs > banzai.async.com.br.2438546533: reply ok 720 read [|nfs] (DF)
15:37:18.034177 banzai.async.com.br.2455323749 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
15:37:18.034211 anthem.async.com.br.nfs > banzai.async.com.br.2455323749: reply ok 128 lookup [|nfs] (DF)
15:37:18.034490 banzai.async.com.br.2472100965 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
15:37:18.034534 anthem.async.com.br.nfs > banzai.async.com.br.2472100965: reply ok 816 read [|nfs] (DF)

  [ 10s here ]

15:37:28.041862 banzai.async.com.br.2488878181 > anthem.async.com.br.nfs: 100 getattr [|nfs] (DF)
15:37:28.042385 anthem.async.com.br.nfs > banzai.async.com.br.2488878181: reply ok 96 getattr [|nfs] (DF)
15:37:28.042820 banzai.async.com.br.2505655397 > anthem.async.com.br.nfs: 100 getattr [|nfs] (DF)
15:37:28.042892 anthem.async.com.br.nfs > banzai.async.com.br.2505655397: reply ok 96 getattr [|nfs] (DF)

  At this point, a pattern forms. Second 28 has nfs activity, then hangs
  till second 40. Second 40 has has nfs activity, then hangs till second
  45. Then second 04. Then second 15. Then 20... 

* The kernel log output when 217 is echoed to nlm_debug:

Nov 28 15:36:37 anthem kernel: eth0: Setting promiscuous mode.
Nov 28 15:37:15 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:15 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:15 anthem kernel: lockd: host garbage collection
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_mark_resources
Nov 28 15:37:15 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:15 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:15 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_lock(090b/26259, ty=1, pi=1, 0-9223372036854775807, bl=1)
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=1 0-9223372036854775807 ty=1 cookie=124f
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_insert_block(c33be400, -1)
Nov 28 15:37:15 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:15 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:15 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:15 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_cancel(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=1 0-9223372036854775807 ty=1 cookie=124f
Nov 28 15:37:45 anthem kernel: lockd: deleting block c33be400...
Nov 28 15:37:45 anthem kernel: lockd: unblocking blocked lock
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_unlock(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_cancel(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=2
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lock(090b/26259, ty=1, pi=1, 0-9223372036854775807, bl=1)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: blocking on this lock (allocating).
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: created block c33be400...
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_insert_block(c33be400, -1)
Nov 28 15:37:45 anthem kernel: lockd: blocking on this lock.
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:38:15 anthem kernel: lockd: request from c0a86309
Nov 28 15:38:15 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)

[ Then we have a lot of entries for second 15, then a number for 
  second 45, then more for 15... ]

I also noticed an interesting thing. Certain boxes on the network are
hanging; others are not. And if I `ls -l /var/lib/nfs/sm', I get:

/var/lib/nfs/sm:
total 0
-rw-------   1 root     root            0 Nov 26 16:41 192.168.99.2
-rw-------   1 root     root            0 Nov 28 15:48 192.168.99.5
-rw-------   1 root     root            0 Nov 23 17:06 192.168.99.7
-rw-------   1 root     root            0 Nov 25 22:10 192.168.99.9

The boxes .2, .5 and .7 are hanging. The box .9 is not. See the dates
on their files? Could this be related to the problem?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
  2002-11-29  1:34                   ` Christian Reis
@ 2002-12-02 14:51                     ` Christian Reis
  0 siblings, 0 replies; 17+ messages in thread
From: Christian Reis @ 2002-12-02 14:51 UTC (permalink / raw)
  To: Trond Myklebust, NFS, linux-kernel

On Thu, Nov 28, 2002 at 11:34:12PM -0200, Christian Reis wrote:
> On Wed, Nov 27, 2002 at 09:31:42PM +0100, Trond Myklebust wrote:
> > >>>>> " " == Christian Reis <kiko@async.com.br> writes:
> > 
> >      > What can I do to help further debug the issue? Try another
> >      > kernel version on clients? Server? This is giving me a
> >      > headache.. :-/
> > 
> > Try to give us a dump of the server lock manager activity when this
> > problem occurs. Try to do
> > 
> > echo "217" >/proc/sys/sunrpc/nlm_debug
> 
> Okay, I've just done that. Just completing information, this server uses
> ReiserFS filesystems, and I know we've had issues with knfsd and reiser
> before, so I guess it's at least relevant to say. 

Trond, did you have a minute to look at this? I wonder if it's really
related to Reiser or not..

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.4.19+trond and diskless locking problems
       [not found]                     ` <200212021921.24330.trond.myklebust@fys.uio.no>
@ 2002-12-04 14:20                       ` Christian Reis
  0 siblings, 0 replies; 17+ messages in thread
From: Christian Reis @ 2002-12-04 14:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, NFS


Hey there. Another update on the wierd hangs. After rebooting the server
(and cleaning up /var/lib/nfs/sm consequently) we've had a couple of
days with no hangs. I've been keeping an eye on that directory closely
the past days to see if we have leftover entries there, and today one
showed up:

anthem:~$ date
Wed Dec  4 11:59:31 BRDT 2002
anthem:~$ sudo ls -l /var/lib/nfs/sm
total 0
-rw-------   1 root     root            0 Dec  4 08:47 192.168.99.7

So I have a file there that has been untouched for over 3 hours now.
Having a look at the logfile, I have:

    [... bootup messages culminating in ntpd startup ]

    Dec  4 08:44:16 canario ntpd[317]: kernel time discipline status 0040
    Dec  4 08:44:16 canario ntpd[317]: frequency initialized -97.153
    from /var/lib/ntp/driftfile.canario
    Dec  4 08:47:22 canario rpc.statd[101]: Received erroneous SM_UNMON
    request from canario for 192.168.99.4

    [ same time as file above, see ]

    Dec  4 08:48:47 canario ntpd[317]: kernel time discipline status change 41
    Dec  4 08:50:01 canario /USR/SBIN/CRON[425]: (smmsp) CMD (test -x
    /usr/share/sendmail/sendmail && /usr/share/sendmail/sendmail cron-msp)

    [... more bootup messages]

The reboot log from the night before shows a series of 8 erroneous
SM_UNMON messenges for a period of about an hour before the actual
reboot. The machine was shut down unclean the time before that (well,
as unclean as an NFS-mounted box can be).

And, sure enough, when shutting down this box now, we get the
stalls/pauses again. nlm_debug outputs for both the client and the
server (neat 30s intervals show up :-) - I've selected specific seconds
of activity (i.e. there was a set of entries at 12:06:32 before):

    CLIENT:

    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:02 canario kernel: lockd: next rebind in 6000 jiffies
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:02 canario kernel: lockd: host garbage collection
    Dec  4 12:07:02 canario kernel: lockd: nlmsvc_mark_resources
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)

    SERVER:

    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:02 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1393
    Dec  4 12:07:02 anthem kernel: lockd: deleting block d12e9000...
    Dec  4 12:07:02 anthem kernel: lockd: unblocking blocked lock
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=2
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775807, bl=1)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:02 anthem kernel: lockd: blocking on this lock (allocating).
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: created block d12e9000...
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
    Dec  4 12:07:02 anthem kernel: lockd: blocking on this lock.
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: host garbage collection
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_mark_resources
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:32 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1396
    Dec  4 12:07:32 anthem kernel: lockd: deleting block d12e9000...
    Dec  4 12:07:32 anthem kernel: lockd: unblocking blocked lock
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=2
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775 807, bl=1)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=1
    Dec  4 12:07:32 anthem kernel: lockd: blocking on this lock (allocating).
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: created block d12e9000...
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
    Dec  4 12:07:32 anthem kernel: lockd: blocking on this lock.
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)

HTH. Yes, we're all sick of this subject :-/

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2002-12-04 14:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-03 21:44 2.4.19+trond and diskless locking problems Christian Reis
2002-10-03 22:47 ` Trond Myklebust
2002-10-03 23:26   ` Christian Reis
2002-10-04  1:13     ` Trond Myklebust
2002-11-20 14:02       ` Christian Reis
2002-11-20 17:02         ` Trond Myklebust
2002-11-27  0:41           ` Christian Reis
2002-11-27  1:14             ` Trond Myklebust
2002-11-27  1:44               ` Christian Reis
2002-11-27 17:08               ` Christian Reis
2002-11-27 20:31                 ` Trond Myklebust
2002-11-29  1:34                   ` Christian Reis
2002-12-02 14:51                     ` Christian Reis
     [not found]                   ` <20021128232911.G18175@blackjesus.async.com.br>
     [not found]                     ` <200212021921.24330.trond.myklebust@fys.uio.no>
2002-12-04 14:20                       ` Christian Reis
2002-10-04  7:07 ` [NFS] " Juergen Hasch
2002-10-04 13:17   ` Christian Reis
2002-10-04 16:00     ` Juergen Hasch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).