All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsd hangs for more than 120 seconds
@ 2012-03-31 11:55 Christoph Bartoschek
  2012-03-31 15:01 ` Myklebust, Trond
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Bartoschek @ 2012-03-31 11:55 UTC (permalink / raw)
  To: linux-nfs

Hi,

we use Ubuntu 10.04.3 LTS and often get a traceback for NFS indicating that 
the daemon hangs for several seconds. At the same time some client machines 
cannot access the server and have to wait. After some minutes everything 
goes on.

What could cause the problem? Is there anything we should change?

Here is the message in the kernel log:

[330573.697121] INFO: task nfsd:1376 blocked for more than 120 seconds.
[330573.708375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[330573.730773] nfsd          D 0000000000000001     0  1376      2 
0x00000000
[330573.730776]  ffff88061c21bdc0 0000000000000046 0000000000015f00 
0000000000015f00
[330573.730779]  ffff88061c111ad0 ffff88061c21bfd8 0000000000015f00 
ffff88061c111700
[330573.730781]  0000000000015f00 ffff88061c21bfd8 0000000000015f00 
ffff88061c111ad0
[330573.730784] Call Trace:
[330573.730788]  [<ffffffff81559e67>] __mutex_lock_slowpath+0x107/0x190
[330573.730796]  [<ffffffffa012300f>] ? svc_authorise+0x3f/0x50 [sunrpc]
[330573.730799]  [<ffffffff81559863>] mutex_lock+0x23/0x50
[330573.730807]  [<ffffffffa012d478>] svc_send+0x58/0xe0 [sunrpc]
[330573.730809]  [<ffffffff8105df90>] ? default_wake_function+0x0/0x20
[330573.730817]  [<ffffffffa011faec>] svc_process+0x11c/0x150 [sunrpc]
[330573.730821]  [<ffffffffa0184ae5>] nfsd+0xc5/0x170 [nfsd]
[330573.730830]  [<ffffffffa0184a20>] ? nfsd+0x0/0x170 [nfsd]
[330573.730832]  [<ffffffff81085db6>] kthread+0x96/0xa0
[330573.730835]  [<ffffffff810141aa>] child_rip+0xa/0x20
[330573.730837]  [<ffffffff81085d20>] ? kthread+0x0/0xa0
[330573.730839]  [<ffffffff810141a0>] ? child_rip+0x0/0x20


-- 
Christoph Bartoschek


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd hangs for more than 120 seconds
  2012-03-31 11:55 nfsd hangs for more than 120 seconds Christoph Bartoschek
@ 2012-03-31 15:01 ` Myklebust, Trond
  2012-03-31 16:17   ` Christoph Bartoschek
  0 siblings, 1 reply; 5+ messages in thread
From: Myklebust, Trond @ 2012-03-31 15:01 UTC (permalink / raw)
  To: Christoph Bartoschek; +Cc: linux-nfs

T24gU2F0LCAyMDEyLTAzLTMxIGF0IDEzOjU1ICswMjAwLCBDaHJpc3RvcGggQmFydG9zY2hlayB3
cm90ZToNCj4gSGksDQo+IA0KPiB3ZSB1c2UgVWJ1bnR1IDEwLjA0LjMgTFRTIGFuZCBvZnRlbiBn
ZXQgYSB0cmFjZWJhY2sgZm9yIE5GUyBpbmRpY2F0aW5nIHRoYXQgDQo+IHRoZSBkYWVtb24gaGFu
Z3MgZm9yIHNldmVyYWwgc2Vjb25kcy4gQXQgdGhlIHNhbWUgdGltZSBzb21lIGNsaWVudCBtYWNo
aW5lcyANCj4gY2Fubm90IGFjY2VzcyB0aGUgc2VydmVyIGFuZCBoYXZlIHRvIHdhaXQuIEFmdGVy
IHNvbWUgbWludXRlcyBldmVyeXRoaW5nIA0KPiBnb2VzIG9uLg0KPiANCj4gV2hhdCBjb3VsZCBj
YXVzZSB0aGUgcHJvYmxlbT8gSXMgdGhlcmUgYW55dGhpbmcgd2Ugc2hvdWxkIGNoYW5nZT8NCj4g
DQo+IEhlcmUgaXMgdGhlIG1lc3NhZ2UgaW4gdGhlIGtlcm5lbCBsb2c6DQo+IA0KPiBbMzMwNTcz
LjY5NzEyMV0gSU5GTzogdGFzayBuZnNkOjEzNzYgYmxvY2tlZCBmb3IgbW9yZSB0aGFuIDEyMCBz
ZWNvbmRzLg0KPiBbMzMwNTczLjcwODM3NV0gImVjaG8gMCA+IC9wcm9jL3N5cy9rZXJuZWwvaHVu
Z190YXNrX3RpbWVvdXRfc2VjcyIgZGlzYWJsZXMgDQo+IHRoaXMgbWVzc2FnZS4NCj4gWzMzMDU3
My43MzA3NzNdIG5mc2QgICAgICAgICAgRCAwMDAwMDAwMDAwMDAwMDAxICAgICAwICAxMzc2ICAg
ICAgMiANCj4gMHgwMDAwMDAwMA0KPiBbMzMwNTczLjczMDc3Nl0gIGZmZmY4ODA2MWMyMWJkYzAg
MDAwMDAwMDAwMDAwMDA0NiAwMDAwMDAwMDAwMDE1ZjAwIA0KPiAwMDAwMDAwMDAwMDE1ZjAwDQo+
IFszMzA1NzMuNzMwNzc5XSAgZmZmZjg4MDYxYzExMWFkMCBmZmZmODgwNjFjMjFiZmQ4IDAwMDAw
MDAwMDAwMTVmMDAgDQo+IGZmZmY4ODA2MWMxMTE3MDANCj4gWzMzMDU3My43MzA3ODFdICAwMDAw
MDAwMDAwMDE1ZjAwIGZmZmY4ODA2MWMyMWJmZDggMDAwMDAwMDAwMDAxNWYwMCANCj4gZmZmZjg4
MDYxYzExMWFkMA0KPiBbMzMwNTczLjczMDc4NF0gQ2FsbCBUcmFjZToNCj4gWzMzMDU3My43MzA3
ODhdICBbPGZmZmZmZmZmODE1NTllNjc+XSBfX211dGV4X2xvY2tfc2xvd3BhdGgrMHgxMDcvMHgx
OTANCj4gWzMzMDU3My43MzA3OTZdICBbPGZmZmZmZmZmYTAxMjMwMGY+XSA/IHN2Y19hdXRob3Jp
c2UrMHgzZi8weDUwIFtzdW5ycGNdDQoNCkF0IGEgZ3Vlc3MsIEknZCBzYXkgdGhhdCB5b3VyIG1v
dW50ZCAgb3IgcnBjLnN2Y2dzc2QgaXMgcHJvYmFibHkNCmJ1c3kvaGFuZ2luZywgY2F1c2luZyB0
aGUga2VybmVsIE5GUyBkYWVtb24gdG8gaGFuZyB3aGlsZSBpdCB3YWl0cyB0bw0KYXV0aG9yaXNl
IGEgY2xpZW50IG9yIHVzZXIuIFR5cGljYWxseSwgeW91IHdpbGwgc2VlIHRoZSBhYm92ZSBpbiB0
aGUNCmNhc2Ugb2YgYSBrZXJiZXJvcywgTklTIG9yIGxkYXAgb3V0YWdlLg0KDQpTbyBhcmUgeW91
IHVzaW5nIE5JUyBvciBsZGFwLWJhc2VkIG5ldGdyb3VwcyBpbiB5b3VyIC9ldGMvZXhwb3J0cywg
b3INCmFyZSB5b3VyIGNsaWVudHMgcGVyaGFwcyBtb3VudGluZyB3aXRoIHN5cz1rcmI1Pw0KDQpD
aGVlcnMNCiAgVHJvbmQNCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0
YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd hangs for more than 120 seconds
  2012-03-31 15:01 ` Myklebust, Trond
@ 2012-03-31 16:17   ` Christoph Bartoschek
  2012-03-31 17:13     ` 夜神 岩男
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Bartoschek @ 2012-03-31 16:17 UTC (permalink / raw)
  To: linux-nfs

Myklebust, Trond wrote:

> On Sat, 2012-03-31 at 13:55 +0200, Christoph Bartoschek wrote:
>> Hi,
>> 
>> we use Ubuntu 10.04.3 LTS and often get a traceback for NFS indicating
>> that the daemon hangs for several seconds. At the same time some client
>> machines cannot access the server and have to wait. After some minutes
>> everything goes on.
>> 
>> What could cause the problem? Is there anything we should change?
>> 
>> Here is the message in the kernel log:
>> 
>> [330573.697121] INFO: task nfsd:1376 blocked for more than 120 seconds.
>> [330573.708375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> [disables
>> this message.
>> [330573.730773] nfsd          D 0000000000000001     0  1376      2
>> 0x00000000
>> [330573.730776]  ffff88061c21bdc0 0000000000000046 0000000000015f00
>> 0000000000015f00
>> [330573.730779]  ffff88061c111ad0 ffff88061c21bfd8 0000000000015f00
>> ffff88061c111700
>> [330573.730781]  0000000000015f00 ffff88061c21bfd8 0000000000015f00
>> ffff88061c111ad0
>> [330573.730784] Call Trace:
>> [330573.730788]  [<ffffffff81559e67>] __mutex_lock_slowpath+0x107/0x190
>> [330573.730796]  [<ffffffffa012300f>] ? svc_authorise+0x3f/0x50 [sunrpc]
> 
> At a guess, I'd say that your mountd  or rpc.svcgssd is probably
> busy/hanging, causing the kernel NFS daemon to hang while it waits to
> authorise a client or user. Typically, you will see the above in the
> case of a kerberos, NIS or ldap outage.
> 
> So are you using NIS or ldap-based netgroups in your /etc/exports, or
> are your clients perhaps mounting with sys=krb5?

We are still using NFS3 and NIS. 

We are also sometimes seeing the following problem that might be related:

One user suddenly has no access to a directory and its subdirectories on a 
NFS share. The user always gets "permission denied". The access bits and 
group memberships did not change.

At the same time all other users within the same groups can access the 
directory on the same client machine and on other client machines.

After about 15 minutes the problem vanishes by itself. The user no longer 
gets "permission denied" and everything is normal.

This happens about twice a week for different users. We see no pattern in 
which user is affected and when this happens.

Thanks 
Christoph


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd hangs for more than 120 seconds
  2012-03-31 16:17   ` Christoph Bartoschek
@ 2012-03-31 17:13     ` 夜神 岩男
  2012-04-04 13:58       ` Christoph Bartoschek
  0 siblings, 1 reply; 5+ messages in thread
From: 夜神 岩男 @ 2012-03-31 17:13 UTC (permalink / raw)
  To: linux-nfs



--- On Sun, 2012/4/1, Christoph Bartoschek <ponto@pontohonk.de> wrote:

> Myklebust, Trond wrote:
> 
> > On Sat, 2012-03-31 at 13:55 +0200, Christoph Bartoschek wrote:
> >> Hi,
> >> 
> >> we use Ubuntu 10.04.3 LTS and often get a traceback for NFS indicating
> >> that the daemon hangs for several seconds. At the same time some client
> >> machines cannot access the server and have to wait. After some minutes
> >> everything goes on.
> >> 
> >> What could cause the problem? Is there anything we should change?

> > At a guess, I'd say that your mountd  or rpc.svcgssd is probably
> > busy/hanging, causing the kernel NFS daemon to hang while it waits to
> > authorise a client or user. Typically, you will see the above in the
> > case of a kerberos, NIS or ldap outage.
> > 
> > So are you using NIS or ldap-based netgroups in your /etc/exports, or
> > are your clients perhaps mounting with sys=krb5?

> We are still using NFS3 and NIS. 
> 
> We are also sometimes seeing the following problem that might be related:
> 
> One user suddenly has no access to a directory and its subdirectories on a 
> NFS share. The user always gets "permission denied". The access bits and 
> group memberships did not change.
> 
> At the same time all other users within the same groups can access the 
> directory on the same client machine and on other client machines.
> 
> After about 15 minutes the problem vanishes by itself. The user no longer 
> gets "permission denied" and everything is normal.
> 
> This happens about twice a week for different users. We see no pattern in 
> which user is affected and when this happens.

That sounds a lot like an NIS lookup problem. I've been experiencing hangs (not quite 120 seconds, but over a minute at times, and really annoying) with NFS4 even with an export set this way:

/mnt/export/home *.subdomain.localnet(rw.fsid=0,insecure)
/mnt/export/home *.subdomain.localnet(rw,nohide,insecure)

But its universal, not on a single user. When LDAP was sketchy we used to get a single user or a few users who wouldn't get a complete directory listing of, say, /home/* uid:gid owners, and so that one user would not be able to access anything that didn't come in the listing before it failed until the cache cleared.

But that's been sorted. The only applications that are completely handicapped by the current mystery problem are email clients like Thunderbird and Evolution, and it seems that new requests pass through fine (like a new file browser instance or browsing in Bash works). I've yet to figure that out.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd hangs for more than 120 seconds
  2012-03-31 17:13     ` 夜神 岩男
@ 2012-04-04 13:58       ` Christoph Bartoschek
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Bartoschek @ 2012-04-04 13:58 UTC (permalink / raw)
  To: linux-nfs

I have just seen that the time on our NIS server was several minutes off. 
Could this be the reason for our problems?

I know that kerberos needs accurate time, but is this also the case for NIS?

Christoph


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-04 14:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-31 11:55 nfsd hangs for more than 120 seconds Christoph Bartoschek
2012-03-31 15:01 ` Myklebust, Trond
2012-03-31 16:17   ` Christoph Bartoschek
2012-03-31 17:13     ` 夜神 岩男
2012-04-04 13:58       ` Christoph Bartoschek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.