linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS client locking hangs for period
@ 2003-01-24 20:49 Christian Reis
  2003-01-25  3:54 ` Neil Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Reis @ 2003-01-24 20:49 UTC (permalink / raw)
  To: neilb; +Cc: linux-kernel, NFS


Hello Neil,

I've been trying to get at this problem for a while now, and had been
concentrating on the client-side of the problem (and consequently
bothering Trond about it) [1,2]. I am now pretty much convinced this is a
server-side problem, and as I've patched 2.4.20 with all the NFS patches
pending (that didn't have to do with the kernel lock breaking) and still
see the issue, I decided to report this bug.

The scenario is: a set of NFS clients with root mounted over nfs from a
single server. Clients run vanilla 2.4.20, server runs 2.4.20 patched
with your server-side patches I mentioned above. The clients run okay
for a period, and then one of them will start to hang for long periods
of time for certain operations (it happens on startup and shutdown, for
instance). Once the client hangs start the server needs to be rebooted
for it to clear up.

It seems to be reproducible by having the client hang or reboot without
shutting down properly. Another tip is that the server gets files left
over in /var/lib/nfs/sm/ for the hanging client(s). 

I've been trying to track this down for a while, but since I'm not very
proficient with debugging at this level, I haven't had much luck. It's
really a problem because I need to reboot and make 20 people stop
working when the problem gets serious. Trond has had a hand trying
to help me, but we still haven't uncovered anything. I wonder if you
have any clue what could be happenning?

The other details are standard: the clients are debian woodys with
nfs-utils 1.0.1 installed, and the server has the same version. The
server runs reiserfs over RAID-1 partitions (using the kernel md
driver). Could it be triggered because of this perhaps unusual
combination?

Some of the messages I point out below have some info about the issue -
including tcpdumps and traces of nlm_debug on the server and client.

Mount options follow for the client filesystems:

anthem:/export/root/    /   nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=2 0 0
anthem:/home    /home   nfs defaults,rw,rsize=8192,wsize=8192,nfsvers=3 0 0

I have checked and, yes, root is mounted using version 2 and the rest as
version 3. Perhaps I should try getting the kernel to mount root using
version 3?

[1] http://groups.google.com/groups?q=trond+christian+nfs&hl=pt&lr=&ie=UTF-8&client=googlet&scoring=d&selm=20030108151424.N2628%40blackjesus.async.com.br.lucky.linux.kernel&rnum=1
[2] http://groups.google.com/groups?hl=pt&lr=&ie=UTF-8&client=googlet&th=3575b3c5f3360eb0&seekm=20030108151424.N2628%40blackjesus.async.com.br.lucky.linux.kernel&frame=off

Thanks for any help you can give.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: NFS client locking hangs for period
@ 2003-04-25  4:57 Christian Reis
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Reis @ 2003-04-25  4:57 UTC (permalink / raw)
  To: NFS; +Cc: linux-kernel


Well, since I've more or less moved on from my original problems, I
should probably post a summary of what was going on, and what I did to
work around it.

Details can be read out from [1]: after a certain amount of time a
number diskless clients, which were mounting everything from the same
NFS server, started getting hung lock requests from the server. The
server ran 2.4.20, reiserfs over RAID-1 mounted with 2 SCSI disks on an
Adaptec 29160. The clients were debian woodys running 2.4.20.

Our diskless setup is a bit unusual: all the clients mount the same root
partition. I tried to be very careful to make sure no files were written
to on /, but I never got to the point where the clients could mount the
directory read-only. I used devfs to make sure that the /dev directories
were `localized' and syslog/console ownership and permissions kept sane.

The locking problem, however, was not related to the root filesystem --
it seems to have happened with files on the /var/log mount, which is
separate for each box (but still coming from a shared filesystem
/export/root on the server, which contains all the client directories).
If I mounted /var/log with the nolock option, they ran fine. This took
me a very long time to figure out, and I'd advise anyone with locking
problems to give it a go.

I should point out that this *does* seem to be a bug in the NFS server
code. I think it is associated with reiserfs, being that I haven't seen
it happen on other partition types. Rebooting the server cleared up the
problem. Erasing or changing files in /var/lib/nfs did not. While I was
initially using a volatile /var/lib/nfs directory on the *clients*, I
changed this on Trond's suggestion [2]. It did not fix the problem.

However, since I know little about the code itself, and it's not very
clear how one should debug, I was unable to pinpoint the exact source of
the problem, which very much saddens me.  The workaround, however, was
quite effective.

[1] http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&th=9db70994c3458f46&rnum=1
[2] http://groups.google.com/groups?q=christian+reis+nfs+locking&hl=en&lr=&ie=UTF-8&scoring=d&selm=20030126231006%246e11%40gated-at.bofh.it&rnum=3

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-04-25  4:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-24 20:49 NFS client locking hangs for period Christian Reis
2003-01-25  3:54 ` Neil Brown
2003-01-26 16:02   ` Christian Reis
2003-01-26 21:49     ` [NFS] " Trond Myklebust
2003-01-26 22:47       ` Christian Reis
2003-01-26 23:02         ` Trond Myklebust
2003-01-26 23:56           ` Christian Reis
2003-01-27  0:06             ` Trond Myklebust
2003-01-27  2:19               ` Dell Latitude CPi keyboard problems since 2.5.42 Tom Sightler
2003-01-28  8:14         ` [NFS] Re: NFS client locking hangs for period Denis Vlasenko
2003-01-28 16:47           ` Christian Reis
2003-01-28  8:00     ` Denis Vlasenko
2003-01-28 16:44       ` Christian Reis
2003-01-29 21:53       ` Daniel Egger
2003-04-25  4:57 Christian Reis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).