All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.4. sunrpc oops during shutdown
@ 2012-05-21 17:14 Dave Jones
  2012-05-21 18:03   ` Myklebust, Trond
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2012-05-21 17:14 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, Linux Kernel

Tried to shutdown a machine, got this, and a bunch of hung processes.
There was one NFS mount mounted at the time.

	Dave

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
PGD 1434c4067 PUD 144964067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
CPU 4 
Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6

Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
Stack:
 ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
 0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
 ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
Call Trace:
 [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
 [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
 [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
 [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
 [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
 [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
 [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
 [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
 [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
 [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
 [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
 [<ffffffff811ccc16>] mntput+0x26/0x40
 [<ffffffff811cd597>] release_mounts+0x77/0x90
 [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
 [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
 [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
 [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
 [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
 [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
 [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
 [<ffffffff8104f377>] sys_exit_group+0x17/0x20
 [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-21 17:14 3.4. sunrpc oops during shutdown Dave Jones
@ 2012-05-21 18:03   ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-21 18:03 UTC (permalink / raw)
  To: Dave Jones; +Cc: bfields, linux-nfs, Linux Kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3896 bytes --]

On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> Tried to shutdown a machine, got this, and a bunch of hung processes.
> There was one NFS mount mounted at the time.
> 
> 	Dave
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> PGD 1434c4067 PUD 144964067 PMD 0 
> Oops: 0000 [#1] PREEMPT SMP 
> CPU 4 
> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> 
> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> Stack:
>  ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
>  0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
>  ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> Call Trace:
>  [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>  [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
>  [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
>  [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
>  [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
>  [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
>  [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
>  [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
>  [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
>  [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
>  [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
>  [<ffffffff811ccc16>] mntput+0x26/0x40
>  [<ffffffff811cd597>] release_mounts+0x77/0x90
>  [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
>  [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
>  [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
>  [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
>  [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
>  [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
>  [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
>  [<ffffffff8104f377>] sys_exit_group+0x17/0x20
>  [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 

Aside from the fact that the current net_namespace is not guaranteed to
exist when we are called from free_nsproxy, svc_destroy() looks
seriously broken:

      * On the one hand it is trying to free struct svc_serv (and
        presumably all structures owned by struct svc_serv).
      * On the other hand, it tries to pass a parameter to
        svc_close_net() saying "please don't free structures on my
        sv_tempsocks, or sv_permsocks list unless they match this net
        namespace".

Bruce, how is this supposed to be working?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
@ 2012-05-21 18:03   ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-21 18:03 UTC (permalink / raw)
  To: Dave Jones; +Cc: bfields, linux-nfs, Linux Kernel

T24gTW9uLCAyMDEyLTA1LTIxIGF0IDEzOjE0IC0wNDAwLCBEYXZlIEpvbmVzIHdyb3RlOg0KPiBU
cmllZCB0byBzaHV0ZG93biBhIG1hY2hpbmUsIGdvdCB0aGlzLCBhbmQgYSBidW5jaCBvZiBodW5n
IHByb2Nlc3Nlcy4NCj4gVGhlcmUgd2FzIG9uZSBORlMgbW91bnQgbW91bnRlZCBhdCB0aGUgdGlt
ZS4NCj4gDQo+IAlEYXZlDQo+IA0KPiBCVUc6IHVuYWJsZSB0byBoYW5kbGUga2VybmVsIE5VTEwg
cG9pbnRlciBkZXJlZmVyZW5jZSBhdCAwMDAwMDAwMDAwMDAwMDI4DQo+IElQOiBbPGZmZmZmZmZm
YTAxMTkxZGY+XSBzdmNfZGVzdHJveSsweDFmLzB4MTQwIFtzdW5ycGNdDQo+IFBHRCAxNDM0YzQw
NjcgUFVEIDE0NDk2NDA2NyBQTUQgMCANCj4gT29wczogMDAwMCBbIzFdIFBSRUVNUFQgU01QIA0K
PiBDUFUgNCANCj4gTW9kdWxlcyBsaW5rZWQgaW46IGlwNnRhYmxlX2ZpbHRlcigtKSBpcDZfdGFi
bGVzIG5mc2QgbmZzIGZzY2FjaGUgYXV0aF9ycGNnc3MgbmZzX2FjbCBsb2NrZCBpcDZ0X1JFSkVD
VCBuZl9jb25udHJhY2tfaXB2NiBuZl9kZWZyYWdfaXB2Ng0KPiANCj4gUGlkOiA2OTQ2LCBjb21t
OiBudHBkIE5vdCB0YWludGVkIDMuNC4wKyAjMTMgDQo+IFJJUDogMDAxMDpbPGZmZmZmZmZmYTAx
MTkxZGY+XSAgWzxmZmZmZmZmZmEwMTE5MWRmPl0gc3ZjX2Rlc3Ryb3krMHgxZi8weDE0MCBbc3Vu
cnBjXQ0KPiBSU1A6IDAwMTg6ZmZmZjg4MDE0M2M2NWM0OCAgRUZMQUdTOiAwMDAxMDI4Ng0KPiBS
QVg6IDAwMDAwMDAwMDAwMDAwMDAgUkJYOiBmZmZmODgwMTQyY2Q0MWEwIFJDWDogMDAwMDAwMDAw
MDAwMDAwNg0KPiBSRFg6IDAwMDAwMDAwMDAwMDAwNDAgUlNJOiBmZmZmODgwMTQzMTA1MDI4IFJE
STogZmZmZjg4MDE0MmNkNDFhMA0KPiBSQlA6IGZmZmY4ODAxNDNjNjVjNTggUjA4OiAwMDAwMDAw
MDAwMDAwMDAwIFIwOTogMDAwMDAwMDAwMDAwMDAwMQ0KPiBSMTA6IDAwMDAwMDAwMDAwMDAwMDAg
UjExOiAwMDAwMDAwMDAwMDAwMDAwIFIxMjogZmZmZjg4MDEzYmM1YTE0OA0KPiBSMTM6IGZmZmY4
ODAxNDA5ODE2NTggUjE0OiBmZmZmODgwMTQyY2Q0MWEwIFIxNTogZmZmZjg4MDE0NmM4ODAwMA0K
PiBGUzogIDAwMDA3ZmRjMDM4MmE3NDAoMDAwMCkgR1M6ZmZmZjg4MDE0OTQwMDAwMCgwMDAwKSBr
bmxHUzowMDAwMDAwMDAwMDAwMDAwDQo+IENTOiAgMDAxMCBEUzogMDAwMCBFUzogMDAwMCBDUjA6
IDAwMDAwMDAwODAwNTAwMzMNCj4gQ1IyOiAwMDAwMDAwMDAwMDAwMDI4IENSMzogMDAwMDAwMDAz
NmNiYjAwMCBDUjQ6IDAwMDAwMDAwMDAxNDA3ZTANCj4gRFIwOiAwMDAwMDAwMDAwMDAwMDAwIERS
MTogMDAwMDAwMDAwMDAwMDAwMCBEUjI6IDAwMDAwMDAwMDAwMDAwMDANCj4gRFIzOiAwMDAwMDAw
MDAwMDAwMDAwIERSNjogMDAwMDAwMDBmZmZmMGZmMCBEUjc6IDAwMDAwMDAwMDAwMDA0MDANCj4g
UHJvY2VzcyBudHBkIChwaWQ6IDY5NDYsIHRocmVhZGluZm8gZmZmZjg4MDE0M2M2NDAwMCwgdGFz
ayBmZmZmODgwMTQzMTA0OTQwKQ0KPiBTdGFjazoNCj4gIGZmZmY4ODAxNDA5ODE2NjAgZmZmZjg4
MDEzYmM1YTE0OCBmZmZmODgwMTQzYzY1Yzg4IGZmZmZmZmZmYTAxMTkzYTYNCj4gIDAwMDAwMDAw
MDAwMDAwMDAgZmZmZjg4MDEzZTU2NjAyMCBmZmZmODgwMTNlNTY1ZjI4IGZmZmY4ODAxNDZlZTZh
YzANCj4gIGZmZmY4ODAxNDNjNjVjYTggZmZmZmZmZmZhMDI0ZjQwMyBmZmZmODgwMTQzYzY1Y2E4
IGZmZmY4ODAxNDNkM2E0ZjgNCj4gQ2FsbCBUcmFjZToNCj4gIFs8ZmZmZmZmZmZhMDExOTNhNj5d
IHN2Y19leGl0X3RocmVhZCsweGE2LzB4YjAgW3N1bnJwY10NCj4gIFs8ZmZmZmZmZmZhMDI0ZjQw
Mz5dIG5mc19jYWxsYmFja19kb3duKzB4NTMvMHg5MCBbbmZzXQ0KPiAgWzxmZmZmZmZmZmEwMjE2
NDJlPl0gbmZzX2ZyZWVfY2xpZW50KzB4ZmUvMHgxMjAgW25mc10NCj4gIFs8ZmZmZmZmZmZhMDIx
ODVkZj5dIG5mc19wdXRfY2xpZW50KzB4MjlmLzB4NDIwIFtuZnNdDQo+ICBbPGZmZmZmZmZmYTAy
MTg0ZTA+XSA/IG5mc19wdXRfY2xpZW50KzB4MWEwLzB4NDIwIFtuZnNdDQo+ICBbPGZmZmZmZmZm
YTAyMTk2MmY+XSBuZnNfZnJlZV9zZXJ2ZXIrMHgxNmYvMHgyZTAgW25mc10NCj4gIFs8ZmZmZmZm
ZmZhMDIxOTRlMz5dID8gbmZzX2ZyZWVfc2VydmVyKzB4MjMvMHgyZTAgW25mc10NCj4gIFs8ZmZm
ZmZmZmZhMDIyMzYzYz5dIG5mczRfa2lsbF9zdXBlcisweDNjLzB4NTAgW25mc10NCj4gIFs8ZmZm
ZmZmZmY4MTFhZDY3Yz5dIGRlYWN0aXZhdGVfbG9ja2VkX3N1cGVyKzB4M2MvMHhhMA0KPiAgWzxm
ZmZmZmZmZjgxMWFlMjllPl0gZGVhY3RpdmF0ZV9zdXBlcisweDRlLzB4NzANCj4gIFs8ZmZmZmZm
ZmY4MTFjY2JhND5dIG1udHB1dF9ub19leHBpcmUrMHhiNC8weDEwMA0KPiAgWzxmZmZmZmZmZjgx
MWNjYzE2Pl0gbW50cHV0KzB4MjYvMHg0MA0KPiAgWzxmZmZmZmZmZjgxMWNkNTk3Pl0gcmVsZWFz
ZV9tb3VudHMrMHg3Ny8weDkwDQo+ICBbPGZmZmZmZmZmODExY2VmYzY+XSBwdXRfbW50X25zKzB4
NjYvMHg4MA0KPiAgWzxmZmZmZmZmZjgxMDc4ZGZmPl0gZnJlZV9uc3Byb3h5KzB4MWYvMHhiMA0K
PiAgWzxmZmZmZmZmZjgxMDc5MDVlPl0gc3dpdGNoX3Rhc2tfbmFtZXNwYWNlcysweDVlLzB4NzAN
Cj4gIFs8ZmZmZmZmZmY4MTA3OTA4MD5dIGV4aXRfdGFza19uYW1lc3BhY2VzKzB4MTAvMHgyMA0K
PiAgWzxmZmZmZmZmZjgxMDRlOTBlPl0gZG9fZXhpdCsweDRlZS8weGI4MA0KPiAgWzxmZmZmZmZm
ZjgxNjM5YzBhPl0gPyByZXRpbnRfc3dhcGdzKzB4ZS8weDEzDQo+ICBbPGZmZmZmZmZmODEwNGYy
ZWY+XSBkb19ncm91cF9leGl0KzB4NGYvMHhjMA0KPiAgWzxmZmZmZmZmZjgxMDRmMzc3Pl0gc3lz
X2V4aXRfZ3JvdXArMHgxNy8weDIwDQo+ICBbPGZmZmZmZmZmODE2NDEzNTI+XSBzeXN0ZW1fY2Fs
bF9mYXN0cGF0aCsweDE2LzB4MWINCj4gQ29kZTogNDggOGIgNWQgZjAgNGMgOGIgNjUgZjggYzkg
YzMgNjYgOTAgNTUgNDggODkgZTUgNDEgNTQgNTMgNjYgNjYgNjYgNjYgOTAgNjUgNDggOGIgMDQg
MjUgODAgYmEgMDAgMDAgNDggOGIgODAgNTAgMDUgMDAgMDAgNDggODkgZmIgPDRjPiA4YiA2MCAy
OCA4YiA0NyA1OCA4NSBjMCAwZiA4NCBlYyAwMCAwMCAwMCA4MyBlOCAwMSA4NSBjMCA4OSANCg0K
QXNpZGUgZnJvbSB0aGUgZmFjdCB0aGF0IHRoZSBjdXJyZW50IG5ldF9uYW1lc3BhY2UgaXMgbm90
IGd1YXJhbnRlZWQgdG8NCmV4aXN0IHdoZW4gd2UgYXJlIGNhbGxlZCBmcm9tIGZyZWVfbnNwcm94
eSwgc3ZjX2Rlc3Ryb3koKSBsb29rcw0Kc2VyaW91c2x5IGJyb2tlbjoNCg0KICAgICAgKiBPbiB0
aGUgb25lIGhhbmQgaXQgaXMgdHJ5aW5nIHRvIGZyZWUgc3RydWN0IHN2Y19zZXJ2IChhbmQNCiAg
ICAgICAgcHJlc3VtYWJseSBhbGwgc3RydWN0dXJlcyBvd25lZCBieSBzdHJ1Y3Qgc3ZjX3NlcnYp
Lg0KICAgICAgKiBPbiB0aGUgb3RoZXIgaGFuZCwgaXQgdHJpZXMgdG8gcGFzcyBhIHBhcmFtZXRl
ciB0bw0KICAgICAgICBzdmNfY2xvc2VfbmV0KCkgc2F5aW5nICJwbGVhc2UgZG9uJ3QgZnJlZSBz
dHJ1Y3R1cmVzIG9uIG15DQogICAgICAgIHN2X3RlbXBzb2Nrcywgb3Igc3ZfcGVybXNvY2tzIGxp
c3QgdW5sZXNzIHRoZXkgbWF0Y2ggdGhpcyBuZXQNCiAgICAgICAgbmFtZXNwYWNlIi4NCg0KQnJ1
Y2UsIGhvdyBpcyB0aGlzIHN1cHBvc2VkIHRvIGJlIHdvcmtpbmc/DQoNCkNoZWVycw0KICBUcm9u
ZA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpO
ZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-21 18:03   ` Myklebust, Trond
  (?)
@ 2012-05-21 21:34   ` bfields
  -1 siblings, 0 replies; 16+ messages in thread
From: bfields @ 2012-05-21 21:34 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Dave Jones, linux-nfs, Linux Kernel

On Mon, May 21, 2012 at 06:03:43PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> > Tried to shutdown a machine, got this, and a bunch of hung processes.
> > There was one NFS mount mounted at the time.
> > 
> > 	Dave
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > PGD 1434c4067 PUD 144964067 PMD 0 
> > Oops: 0000 [#1] PREEMPT SMP 
> > CPU 4 
> > Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> > 
> > Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
> > RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> > RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> > RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> > R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> > FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> > Stack:
> >  ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> >  0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> >  ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> > Call Trace:
> >  [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >  [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> >  [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> >  [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> >  [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> >  [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> >  [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> >  [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> >  [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> >  [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> >  [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> >  [<ffffffff811ccc16>] mntput+0x26/0x40
> >  [<ffffffff811cd597>] release_mounts+0x77/0x90
> >  [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> >  [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> >  [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> >  [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> >  [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> >  [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> >  [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> >  [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> >  [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> > Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 
> 
> Aside from the fact that the current net_namespace is not guaranteed to
> exist when we are called from free_nsproxy, svc_destroy() looks
> seriously broken:
> 
>       * On the one hand it is trying to free struct svc_serv (and
>         presumably all structures owned by struct svc_serv).
>       * On the other hand, it tries to pass a parameter to
>         svc_close_net() saying "please don't free structures on my
>         sv_tempsocks, or sv_permsocks list unless they match this net
>         namespace".
> 
> Bruce, how is this supposed to be working?

I'm not sure, I'll try to take a look tomorrow....

I notice Stanislav has posted a "[PATCH] NFSd: set nfsd_serv to NULL
after service destruction", but I haven't reviewed it yet.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-21 18:03   ` Myklebust, Trond
  (?)
  (?)
@ 2012-05-24 15:55   ` bfields
  2012-05-24 19:20       ` Myklebust, Trond
  -1 siblings, 1 reply; 16+ messages in thread
From: bfields @ 2012-05-24 15:55 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Dave Jones, linux-nfs, Linux Kernel

On Mon, May 21, 2012 at 06:03:43PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> > Tried to shutdown a machine, got this, and a bunch of hung processes.
> > There was one NFS mount mounted at the time.
> > 
> > 	Dave
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > PGD 1434c4067 PUD 144964067 PMD 0 
> > Oops: 0000 [#1] PREEMPT SMP 
> > CPU 4 
> > Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> > 
> > Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
> > RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> > RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> > RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> > R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> > FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> > Stack:
> >  ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> >  0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> >  ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> > Call Trace:
> >  [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >  [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> >  [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> >  [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> >  [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> >  [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> >  [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> >  [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> >  [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> >  [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> >  [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> >  [<ffffffff811ccc16>] mntput+0x26/0x40
> >  [<ffffffff811cd597>] release_mounts+0x77/0x90
> >  [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> >  [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> >  [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> >  [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> >  [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> >  [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> >  [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> >  [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> >  [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> > Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 
> 
> Aside from the fact that the current net_namespace is not guaranteed to
> exist when we are called from free_nsproxy, svc_destroy() looks
> seriously broken:
> 
>       * On the one hand it is trying to free struct svc_serv (and
>         presumably all structures owned by struct svc_serv).
>       * On the other hand, it tries to pass a parameter to
>         svc_close_net() saying "please don't free structures on my
>         sv_tempsocks, or sv_permsocks list unless they match this net
>         namespace".
> 
> Bruce, how is this supposed to be working?

Yeah, I don't know.

For the nfs callback case, it looks like you've just got a single 
callback service shared across all namespaces, and all you want to do 
is destroy that whole thing on last put; or is it more complicated than
that?

For the other servers at least the per-net and global parts of the 
server seem too entangled.

That's unavoidable to some degree since we're sharing threads among the
namespaces.  But maybe separate structures for the per-namespace and
global pieces would help.

At a minimum the per-namespace piece would keep a count of the users in
that namespace.

To make the shutdown race-free I think we also need a way to wait for
all threads processing requests in that namespace, which I don't see
that we have yet.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-24 15:55   ` bfields
@ 2012-05-24 19:20       ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-24 19:20 UTC (permalink / raw)
  To: bfields; +Cc: Dave Jones, linux-nfs, Linux Kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5393 bytes --]

On Thu, 2012-05-24 at 11:55 -0400, bfields@fieldses.org wrote:
> On Mon, May 21, 2012 at 06:03:43PM +0000, Myklebust, Trond wrote:
> > On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> > > Tried to shutdown a machine, got this, and a bunch of hung processes.
> > > There was one NFS mount mounted at the time.
> > > 
> > > 	Dave
> > > 
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > > IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > > PGD 1434c4067 PUD 144964067 PMD 0 
> > > Oops: 0000 [#1] PREEMPT SMP 
> > > CPU 4 
> > > Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> > > 
> > > Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
> > > RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > > RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> > > RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> > > RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> > > RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> > > R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> > > FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> > > Stack:
> > >  ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> > >  0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> > >  ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> > > Call Trace:
> > >  [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > >  [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> > >  [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> > >  [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> > >  [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> > >  [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> > >  [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> > >  [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> > >  [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> > >  [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> > >  [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> > >  [<ffffffff811ccc16>] mntput+0x26/0x40
> > >  [<ffffffff811cd597>] release_mounts+0x77/0x90
> > >  [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> > >  [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> > >  [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> > >  [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> > >  [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> > >  [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> > >  [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> > >  [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> > >  [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> > > Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 
> > 
> > Aside from the fact that the current net_namespace is not guaranteed to
> > exist when we are called from free_nsproxy, svc_destroy() looks
> > seriously broken:
> > 
> >       * On the one hand it is trying to free struct svc_serv (and
> >         presumably all structures owned by struct svc_serv).
> >       * On the other hand, it tries to pass a parameter to
> >         svc_close_net() saying "please don't free structures on my
> >         sv_tempsocks, or sv_permsocks list unless they match this net
> >         namespace".
> > 
> > Bruce, how is this supposed to be working?
> 
> Yeah, I don't know.
> 
> For the nfs callback case, it looks like you've just got a single 
> callback service shared across all namespaces, and all you want to do 
> is destroy that whole thing on last put; or is it more complicated than
> that?

For NFSv4, I need to create sockets for the same net namespace as the
struct nfs_client is running in. When all the struct nfs_clients on that
net namespace are destroyed, I would ideally get rid of those sockets.

For NFSv4.1, all I want to do is create a back channel using the same
socket as the struct nfs_client.

> For the other servers at least the per-net and global parts of the 
> server seem too entangled.
> 
> That's unavoidable to some degree since we're sharing threads among the
> namespaces.  But maybe separate structures for the per-namespace and
> global pieces would help.
> 
> At a minimum the per-namespace piece would keep a count of the users in
> that namespace.
> 
> To make the shutdown race-free I think we also need a way to wait for
> all threads processing requests in that namespace, which I don't see
> that we have yet.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
@ 2012-05-24 19:20       ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-24 19:20 UTC (permalink / raw)
  To: bfields; +Cc: Dave Jones, linux-nfs, Linux Kernel

T24gVGh1LCAyMDEyLTA1LTI0IGF0IDExOjU1IC0wNDAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gTW9uLCBNYXkgMjEsIDIwMTIgYXQgMDY6MDM6NDNQTSArMDAwMCwgTXlrbGVi
dXN0LCBUcm9uZCB3cm90ZToNCj4gPiBPbiBNb24sIDIwMTItMDUtMjEgYXQgMTM6MTQgLTA0MDAs
IERhdmUgSm9uZXMgd3JvdGU6DQo+ID4gPiBUcmllZCB0byBzaHV0ZG93biBhIG1hY2hpbmUsIGdv
dCB0aGlzLCBhbmQgYSBidW5jaCBvZiBodW5nIHByb2Nlc3Nlcy4NCj4gPiA+IFRoZXJlIHdhcyBv
bmUgTkZTIG1vdW50IG1vdW50ZWQgYXQgdGhlIHRpbWUuDQo+ID4gPiANCj4gPiA+IAlEYXZlDQo+
ID4gPiANCj4gPiA+IEJVRzogdW5hYmxlIHRvIGhhbmRsZSBrZXJuZWwgTlVMTCBwb2ludGVyIGRl
cmVmZXJlbmNlIGF0IDAwMDAwMDAwMDAwMDAwMjgNCj4gPiA+IElQOiBbPGZmZmZmZmZmYTAxMTkx
ZGY+XSBzdmNfZGVzdHJveSsweDFmLzB4MTQwIFtzdW5ycGNdDQo+ID4gPiBQR0QgMTQzNGM0MDY3
IFBVRCAxNDQ5NjQwNjcgUE1EIDAgDQo+ID4gPiBPb3BzOiAwMDAwIFsjMV0gUFJFRU1QVCBTTVAg
DQo+ID4gPiBDUFUgNCANCj4gPiA+IE1vZHVsZXMgbGlua2VkIGluOiBpcDZ0YWJsZV9maWx0ZXIo
LSkgaXA2X3RhYmxlcyBuZnNkIG5mcyBmc2NhY2hlIGF1dGhfcnBjZ3NzIG5mc19hY2wgbG9ja2Qg
aXA2dF9SRUpFQ1QgbmZfY29ubnRyYWNrX2lwdjYgbmZfZGVmcmFnX2lwdjYNCj4gPiA+IA0KPiA+
ID4gUGlkOiA2OTQ2LCBjb21tOiBudHBkIE5vdCB0YWludGVkIDMuNC4wKyAjMTMgDQo+ID4gPiBS
SVA6IDAwMTA6WzxmZmZmZmZmZmEwMTE5MWRmPl0gIFs8ZmZmZmZmZmZhMDExOTFkZj5dIHN2Y19k
ZXN0cm95KzB4MWYvMHgxNDAgW3N1bnJwY10NCj4gPiA+IFJTUDogMDAxODpmZmZmODgwMTQzYzY1
YzQ4ICBFRkxBR1M6IDAwMDEwMjg2DQo+ID4gPiBSQVg6IDAwMDAwMDAwMDAwMDAwMDAgUkJYOiBm
ZmZmODgwMTQyY2Q0MWEwIFJDWDogMDAwMDAwMDAwMDAwMDAwNg0KPiA+ID4gUkRYOiAwMDAwMDAw
MDAwMDAwMDQwIFJTSTogZmZmZjg4MDE0MzEwNTAyOCBSREk6IGZmZmY4ODAxNDJjZDQxYTANCj4g
PiA+IFJCUDogZmZmZjg4MDE0M2M2NWM1OCBSMDg6IDAwMDAwMDAwMDAwMDAwMDAgUjA5OiAwMDAw
MDAwMDAwMDAwMDAxDQo+ID4gPiBSMTA6IDAwMDAwMDAwMDAwMDAwMDAgUjExOiAwMDAwMDAwMDAw
MDAwMDAwIFIxMjogZmZmZjg4MDEzYmM1YTE0OA0KPiA+ID4gUjEzOiBmZmZmODgwMTQwOTgxNjU4
IFIxNDogZmZmZjg4MDE0MmNkNDFhMCBSMTU6IGZmZmY4ODAxNDZjODgwMDANCj4gPiA+IEZTOiAg
MDAwMDdmZGMwMzgyYTc0MCgwMDAwKSBHUzpmZmZmODgwMTQ5NDAwMDAwKDAwMDApIGtubEdTOjAw
MDAwMDAwMDAwMDAwMDANCj4gPiA+IENTOiAgMDAxMCBEUzogMDAwMCBFUzogMDAwMCBDUjA6IDAw
MDAwMDAwODAwNTAwMzMNCj4gPiA+IENSMjogMDAwMDAwMDAwMDAwMDAyOCBDUjM6IDAwMDAwMDAw
MzZjYmIwMDAgQ1I0OiAwMDAwMDAwMDAwMTQwN2UwDQo+ID4gPiBEUjA6IDAwMDAwMDAwMDAwMDAw
MDAgRFIxOiAwMDAwMDAwMDAwMDAwMDAwIERSMjogMDAwMDAwMDAwMDAwMDAwMA0KPiA+ID4gRFIz
OiAwMDAwMDAwMDAwMDAwMDAwIERSNjogMDAwMDAwMDBmZmZmMGZmMCBEUjc6IDAwMDAwMDAwMDAw
MDA0MDANCj4gPiA+IFByb2Nlc3MgbnRwZCAocGlkOiA2OTQ2LCB0aHJlYWRpbmZvIGZmZmY4ODAx
NDNjNjQwMDAsIHRhc2sgZmZmZjg4MDE0MzEwNDk0MCkNCj4gPiA+IFN0YWNrOg0KPiA+ID4gIGZm
ZmY4ODAxNDA5ODE2NjAgZmZmZjg4MDEzYmM1YTE0OCBmZmZmODgwMTQzYzY1Yzg4IGZmZmZmZmZm
YTAxMTkzYTYNCj4gPiA+ICAwMDAwMDAwMDAwMDAwMDAwIGZmZmY4ODAxM2U1NjYwMjAgZmZmZjg4
MDEzZTU2NWYyOCBmZmZmODgwMTQ2ZWU2YWMwDQo+ID4gPiAgZmZmZjg4MDE0M2M2NWNhOCBmZmZm
ZmZmZmEwMjRmNDAzIGZmZmY4ODAxNDNjNjVjYTggZmZmZjg4MDE0M2QzYTRmOA0KPiA+ID4gQ2Fs
bCBUcmFjZToNCj4gPiA+ICBbPGZmZmZmZmZmYTAxMTkzYTY+XSBzdmNfZXhpdF90aHJlYWQrMHhh
Ni8weGIwIFtzdW5ycGNdDQo+ID4gPiAgWzxmZmZmZmZmZmEwMjRmNDAzPl0gbmZzX2NhbGxiYWNr
X2Rvd24rMHg1My8weDkwIFtuZnNdDQo+ID4gPiAgWzxmZmZmZmZmZmEwMjE2NDJlPl0gbmZzX2Zy
ZWVfY2xpZW50KzB4ZmUvMHgxMjAgW25mc10NCj4gPiA+ICBbPGZmZmZmZmZmYTAyMTg1ZGY+XSBu
ZnNfcHV0X2NsaWVudCsweDI5Zi8weDQyMCBbbmZzXQ0KPiA+ID4gIFs8ZmZmZmZmZmZhMDIxODRl
MD5dID8gbmZzX3B1dF9jbGllbnQrMHgxYTAvMHg0MjAgW25mc10NCj4gPiA+ICBbPGZmZmZmZmZm
YTAyMTk2MmY+XSBuZnNfZnJlZV9zZXJ2ZXIrMHgxNmYvMHgyZTAgW25mc10NCj4gPiA+ICBbPGZm
ZmZmZmZmYTAyMTk0ZTM+XSA/IG5mc19mcmVlX3NlcnZlcisweDIzLzB4MmUwIFtuZnNdDQo+ID4g
PiAgWzxmZmZmZmZmZmEwMjIzNjNjPl0gbmZzNF9raWxsX3N1cGVyKzB4M2MvMHg1MCBbbmZzXQ0K
PiA+ID4gIFs8ZmZmZmZmZmY4MTFhZDY3Yz5dIGRlYWN0aXZhdGVfbG9ja2VkX3N1cGVyKzB4M2Mv
MHhhMA0KPiA+ID4gIFs8ZmZmZmZmZmY4MTFhZTI5ZT5dIGRlYWN0aXZhdGVfc3VwZXIrMHg0ZS8w
eDcwDQo+ID4gPiAgWzxmZmZmZmZmZjgxMWNjYmE0Pl0gbW50cHV0X25vX2V4cGlyZSsweGI0LzB4
MTAwDQo+ID4gPiAgWzxmZmZmZmZmZjgxMWNjYzE2Pl0gbW50cHV0KzB4MjYvMHg0MA0KPiA+ID4g
IFs8ZmZmZmZmZmY4MTFjZDU5Nz5dIHJlbGVhc2VfbW91bnRzKzB4NzcvMHg5MA0KPiA+ID4gIFs8
ZmZmZmZmZmY4MTFjZWZjNj5dIHB1dF9tbnRfbnMrMHg2Ni8weDgwDQo+ID4gPiAgWzxmZmZmZmZm
ZjgxMDc4ZGZmPl0gZnJlZV9uc3Byb3h5KzB4MWYvMHhiMA0KPiA+ID4gIFs8ZmZmZmZmZmY4MTA3
OTA1ZT5dIHN3aXRjaF90YXNrX25hbWVzcGFjZXMrMHg1ZS8weDcwDQo+ID4gPiAgWzxmZmZmZmZm
ZjgxMDc5MDgwPl0gZXhpdF90YXNrX25hbWVzcGFjZXMrMHgxMC8weDIwDQo+ID4gPiAgWzxmZmZm
ZmZmZjgxMDRlOTBlPl0gZG9fZXhpdCsweDRlZS8weGI4MA0KPiA+ID4gIFs8ZmZmZmZmZmY4MTYz
OWMwYT5dID8gcmV0aW50X3N3YXBncysweGUvMHgxMw0KPiA+ID4gIFs8ZmZmZmZmZmY4MTA0ZjJl
Zj5dIGRvX2dyb3VwX2V4aXQrMHg0Zi8weGMwDQo+ID4gPiAgWzxmZmZmZmZmZjgxMDRmMzc3Pl0g
c3lzX2V4aXRfZ3JvdXArMHgxNy8weDIwDQo+ID4gPiAgWzxmZmZmZmZmZjgxNjQxMzUyPl0gc3lz
dGVtX2NhbGxfZmFzdHBhdGgrMHgxNi8weDFiDQo+ID4gPiBDb2RlOiA0OCA4YiA1ZCBmMCA0YyA4
YiA2NSBmOCBjOSBjMyA2NiA5MCA1NSA0OCA4OSBlNSA0MSA1NCA1MyA2NiA2NiA2NiA2NiA5MCA2
NSA0OCA4YiAwNCAyNSA4MCBiYSAwMCAwMCA0OCA4YiA4MCA1MCAwNSAwMCAwMCA0OCA4OSBmYiA8
NGM+IDhiIDYwIDI4IDhiIDQ3IDU4IDg1IGMwIDBmIDg0IGVjIDAwIDAwIDAwIDgzIGU4IDAxIDg1
IGMwIDg5IA0KPiA+IA0KPiA+IEFzaWRlIGZyb20gdGhlIGZhY3QgdGhhdCB0aGUgY3VycmVudCBu
ZXRfbmFtZXNwYWNlIGlzIG5vdCBndWFyYW50ZWVkIHRvDQo+ID4gZXhpc3Qgd2hlbiB3ZSBhcmUg
Y2FsbGVkIGZyb20gZnJlZV9uc3Byb3h5LCBzdmNfZGVzdHJveSgpIGxvb2tzDQo+ID4gc2VyaW91
c2x5IGJyb2tlbjoNCj4gPiANCj4gPiAgICAgICAqIE9uIHRoZSBvbmUgaGFuZCBpdCBpcyB0cnlp
bmcgdG8gZnJlZSBzdHJ1Y3Qgc3ZjX3NlcnYgKGFuZA0KPiA+ICAgICAgICAgcHJlc3VtYWJseSBh
bGwgc3RydWN0dXJlcyBvd25lZCBieSBzdHJ1Y3Qgc3ZjX3NlcnYpLg0KPiA+ICAgICAgICogT24g
dGhlIG90aGVyIGhhbmQsIGl0IHRyaWVzIHRvIHBhc3MgYSBwYXJhbWV0ZXIgdG8NCj4gPiAgICAg
ICAgIHN2Y19jbG9zZV9uZXQoKSBzYXlpbmcgInBsZWFzZSBkb24ndCBmcmVlIHN0cnVjdHVyZXMg
b24gbXkNCj4gPiAgICAgICAgIHN2X3RlbXBzb2Nrcywgb3Igc3ZfcGVybXNvY2tzIGxpc3QgdW5s
ZXNzIHRoZXkgbWF0Y2ggdGhpcyBuZXQNCj4gPiAgICAgICAgIG5hbWVzcGFjZSIuDQo+ID4gDQo+
ID4gQnJ1Y2UsIGhvdyBpcyB0aGlzIHN1cHBvc2VkIHRvIGJlIHdvcmtpbmc/DQo+IA0KPiBZZWFo
LCBJIGRvbid0IGtub3cuDQo+IA0KPiBGb3IgdGhlIG5mcyBjYWxsYmFjayBjYXNlLCBpdCBsb29r
cyBsaWtlIHlvdSd2ZSBqdXN0IGdvdCBhIHNpbmdsZSANCj4gY2FsbGJhY2sgc2VydmljZSBzaGFy
ZWQgYWNyb3NzIGFsbCBuYW1lc3BhY2VzLCBhbmQgYWxsIHlvdSB3YW50IHRvIGRvIA0KPiBpcyBk
ZXN0cm95IHRoYXQgd2hvbGUgdGhpbmcgb24gbGFzdCBwdXQ7IG9yIGlzIGl0IG1vcmUgY29tcGxp
Y2F0ZWQgdGhhbg0KPiB0aGF0Pw0KDQpGb3IgTkZTdjQsIEkgbmVlZCB0byBjcmVhdGUgc29ja2V0
cyBmb3IgdGhlIHNhbWUgbmV0IG5hbWVzcGFjZSBhcyB0aGUNCnN0cnVjdCBuZnNfY2xpZW50IGlz
IHJ1bm5pbmcgaW4uIFdoZW4gYWxsIHRoZSBzdHJ1Y3QgbmZzX2NsaWVudHMgb24gdGhhdA0KbmV0
IG5hbWVzcGFjZSBhcmUgZGVzdHJveWVkLCBJIHdvdWxkIGlkZWFsbHkgZ2V0IHJpZCBvZiB0aG9z
ZSBzb2NrZXRzLg0KDQpGb3IgTkZTdjQuMSwgYWxsIEkgd2FudCB0byBkbyBpcyBjcmVhdGUgYSBi
YWNrIGNoYW5uZWwgdXNpbmcgdGhlIHNhbWUNCnNvY2tldCBhcyB0aGUgc3RydWN0IG5mc19jbGll
bnQuDQoNCj4gRm9yIHRoZSBvdGhlciBzZXJ2ZXJzIGF0IGxlYXN0IHRoZSBwZXItbmV0IGFuZCBn
bG9iYWwgcGFydHMgb2YgdGhlIA0KPiBzZXJ2ZXIgc2VlbSB0b28gZW50YW5nbGVkLg0KPiANCj4g
VGhhdCdzIHVuYXZvaWRhYmxlIHRvIHNvbWUgZGVncmVlIHNpbmNlIHdlJ3JlIHNoYXJpbmcgdGhy
ZWFkcyBhbW9uZyB0aGUNCj4gbmFtZXNwYWNlcy4gIEJ1dCBtYXliZSBzZXBhcmF0ZSBzdHJ1Y3R1
cmVzIGZvciB0aGUgcGVyLW5hbWVzcGFjZSBhbmQNCj4gZ2xvYmFsIHBpZWNlcyB3b3VsZCBoZWxw
Lg0KPiANCj4gQXQgYSBtaW5pbXVtIHRoZSBwZXItbmFtZXNwYWNlIHBpZWNlIHdvdWxkIGtlZXAg
YSBjb3VudCBvZiB0aGUgdXNlcnMgaW4NCj4gdGhhdCBuYW1lc3BhY2UuDQo+IA0KPiBUbyBtYWtl
IHRoZSBzaHV0ZG93biByYWNlLWZyZWUgSSB0aGluayB3ZSBhbHNvIG5lZWQgYSB3YXkgdG8gd2Fp
dCBmb3INCj4gYWxsIHRocmVhZHMgcHJvY2Vzc2luZyByZXF1ZXN0cyBpbiB0aGF0IG5hbWVzcGFj
ZSwgd2hpY2ggSSBkb24ndCBzZWUNCj4gdGhhdCB3ZSBoYXZlIHlldC4NCg0KDQotLSANClRyb25k
IE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQu
TXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-24 19:20       ` Myklebust, Trond
  (?)
@ 2012-05-24 20:27       ` bfields
  -1 siblings, 0 replies; 16+ messages in thread
From: bfields @ 2012-05-24 20:27 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Dave Jones, linux-nfs, Linux Kernel, Stanislav Kinsbursky

On Thu, May 24, 2012 at 07:20:41PM +0000, Myklebust, Trond wrote:
> On Thu, 2012-05-24 at 11:55 -0400, bfields@fieldses.org wrote:
> > On Mon, May 21, 2012 at 06:03:43PM +0000, Myklebust, Trond wrote:
> > > On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> > > > Tried to shutdown a machine, got this, and a bunch of hung processes.
> > > > There was one NFS mount mounted at the time.
> > > > 
> > > > 	Dave
> > > > 
> > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > > > IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > > > PGD 1434c4067 PUD 144964067 PMD 0 
> > > > Oops: 0000 [#1] PREEMPT SMP 
> > > > CPU 4 
> > > > Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> > > > 
> > > > Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13 
> > > > RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> > > > RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> > > > RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> > > > RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> > > > RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> > > > R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> > > > FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> > > > Stack:
> > > >  ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> > > >  0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> > > >  ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> > > > Call Trace:
> > > >  [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > > >  [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> > > >  [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> > > >  [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> > > >  [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> > > >  [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> > > >  [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> > > >  [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> > > >  [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> > > >  [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> > > >  [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> > > >  [<ffffffff811ccc16>] mntput+0x26/0x40
> > > >  [<ffffffff811cd597>] release_mounts+0x77/0x90
> > > >  [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> > > >  [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> > > >  [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> > > >  [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> > > >  [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> > > >  [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> > > >  [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> > > >  [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> > > >  [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> > > > Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb <4c> 8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89 
> > > 
> > > Aside from the fact that the current net_namespace is not guaranteed to
> > > exist when we are called from free_nsproxy, svc_destroy() looks
> > > seriously broken:
> > > 
> > >       * On the one hand it is trying to free struct svc_serv (and
> > >         presumably all structures owned by struct svc_serv).
> > >       * On the other hand, it tries to pass a parameter to
> > >         svc_close_net() saying "please don't free structures on my
> > >         sv_tempsocks, or sv_permsocks list unless they match this net
> > >         namespace".
> > > 
> > > Bruce, how is this supposed to be working?
> > 
> > Yeah, I don't know.
> > 
> > For the nfs callback case, it looks like you've just got a single 
> > callback service shared across all namespaces, and all you want to do 
> > is destroy that whole thing on last put; or is it more complicated than
> > that?
> 
> For NFSv4, I need to create sockets for the same net namespace as the
> struct nfs_client is running in. When all the struct nfs_clients on that
> net namespace are destroyed, I would ideally get rid of those sockets.
> 
> For NFSv4.1, all I want to do is create a back channel using the same
> socket as the struct nfs_client.

Thanks, makes sense.

Uh, I meant to cc: Stanislav on that last reply but didn't somehow.

--b.

> 
> > For the other servers at least the per-net and global parts of the 
> > server seem too entangled.
> > 
> > That's unavoidable to some degree since we're sharing threads among the
> > namespaces.  But maybe separate structures for the per-namespace and
> > global pieces would help.
> > 
> > At a minimum the per-namespace piece would keep a count of the users in
> > that namespace.
> > 
> > To make the shutdown race-free I think we also need a way to wait for
> > all threads processing requests in that namespace, which I don't see
> > that we have yet.
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-21 18:03   ` Myklebust, Trond
                     ` (2 preceding siblings ...)
  (?)
@ 2012-05-25  8:12   ` Stanislav Kinsbursky
  2012-05-25 13:07       ` Myklebust, Trond
  -1 siblings, 1 reply; 16+ messages in thread
From: Stanislav Kinsbursky @ 2012-05-25  8:12 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

On 21.05.2012 22:03, Myklebust, Trond wrote:
> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
>> Tried to shutdown a machine, got this, and a bunch of hung processes.
>> There was one NFS mount mounted at the time.
>>
>> 	Dave
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>> PGD 1434c4067 PUD 144964067 PMD 0
>> Oops: 0000 [#1] PREEMPT SMP
>> CPU 4
>> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
>>
>> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13
>> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
>> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
>> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
>> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
>> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
>> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
>> Stack:
>>   ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
>>   0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
>>   ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
>> Call Trace:
>>   [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>   [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
>>   [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
>>   [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
>>   [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
>>   [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
>>   [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
>>   [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
>>   [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
>>   [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
>>   [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
>>   [<ffffffff811ccc16>] mntput+0x26/0x40
>>   [<ffffffff811cd597>] release_mounts+0x77/0x90
>>   [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
>>   [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
>>   [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
>>   [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
>>   [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
>>   [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
>>   [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
>>   [<ffffffff8104f377>] sys_exit_group+0x17/0x20
>>   [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
>> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb<4c>  8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89
>
> Aside from the fact that the current net_namespace is not guaranteed to
> exist when we are called from free_nsproxy, svc_destroy() looks
> seriously broken:

Trond, looks like you are mistaken here.
Any process holds references to all namespaces it belong to (copy_net_ns() 
increase usage counter). And network namespace is released after mount namespace 
in free_nsproxy.

>
>        * On the one hand it is trying to free struct svc_serv (and
>          presumably all structures owned by struct svc_serv).
>        * On the other hand, it tries to pass a parameter to
>          svc_close_net() saying "please don't free structures on my
>          sv_tempsocks, or sv_permsocks list unless they match this net
>          namespace".
>

I've sent patches, which moves svc_shutdown_net() from svc_destroy() ("SUNRPC: 
separate per-net data creation from service").
with this patch set it's assumed, that per-net resources will be created or 
released prior to service creation and destruction.

> Bruce, how is this supposed to be working?
>
> Cheers
>    Trond


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-25  8:12   ` Stanislav Kinsbursky
@ 2012-05-25 13:07       ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-25 13:07 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4964 bytes --]

On Fri, 2012-05-25 at 12:12 +0400, Stanislav Kinsbursky wrote:
> On 21.05.2012 22:03, Myklebust, Trond wrote:
> > On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> >> Tried to shutdown a machine, got this, and a bunch of hung processes.
> >> There was one NFS mount mounted at the time.
> >>
> >> 	Dave
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> >> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> >> PGD 1434c4067 PUD 144964067 PMD 0
> >> Oops: 0000 [#1] PREEMPT SMP
> >> CPU 4
> >> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> >>
> >> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13
> >> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> >> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> >> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> >> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> >> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> >> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> >> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> >> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> >> Stack:
> >>   ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> >>   0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> >>   ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> >> Call Trace:
> >>   [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >>   [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> >>   [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> >>   [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> >>   [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> >>   [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> >>   [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> >>   [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> >>   [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> >>   [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> >>   [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> >>   [<ffffffff811ccc16>] mntput+0x26/0x40
> >>   [<ffffffff811cd597>] release_mounts+0x77/0x90
> >>   [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> >>   [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> >>   [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> >>   [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> >>   [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> >>   [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> >>   [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> >>   [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> >>   [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> >> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb<4c>  8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89
> >
> > Aside from the fact that the current net_namespace is not guaranteed to
> > exist when we are called from free_nsproxy, svc_destroy() looks
> > seriously broken:
> 
> Trond, looks like you are mistaken here.
> Any process holds references to all namespaces it belong to (copy_net_ns() 
> increase usage counter). And network namespace is released after mount namespace 
> in free_nsproxy.

That doesn't help you though. switch_task_namespaces will have already
set current->nsproxy to NULL, which is why we Oops when we try to read
current->nsproxy->net_ns in svc_exit_thread().

> >
> >        * On the one hand it is trying to free struct svc_serv (and
> >          presumably all structures owned by struct svc_serv).
> >        * On the other hand, it tries to pass a parameter to
> >          svc_close_net() saying "please don't free structures on my
> >          sv_tempsocks, or sv_permsocks list unless they match this net
> >          namespace".
> >
> 
> I've sent patches, which moves svc_shutdown_net() from svc_destroy() ("SUNRPC: 
> separate per-net data creation from service").
> with this patch set it's assumed, that per-net resources will be created or 
> released prior to service creation and destruction.

Are those patches appropriate for inclusion in the stable kernel series
so that we can fix 3.4?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
@ 2012-05-25 13:07       ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-25 13:07 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

T24gRnJpLCAyMDEyLTA1LTI1IGF0IDEyOjEyICswNDAwLCBTdGFuaXNsYXYgS2luc2J1cnNreSB3
cm90ZToNCj4gT24gMjEuMDUuMjAxMiAyMjowMywgTXlrbGVidXN0LCBUcm9uZCB3cm90ZToNCj4g
PiBPbiBNb24sIDIwMTItMDUtMjEgYXQgMTM6MTQgLTA0MDAsIERhdmUgSm9uZXMgd3JvdGU6DQo+
ID4+IFRyaWVkIHRvIHNodXRkb3duIGEgbWFjaGluZSwgZ290IHRoaXMsIGFuZCBhIGJ1bmNoIG9m
IGh1bmcgcHJvY2Vzc2VzLg0KPiA+PiBUaGVyZSB3YXMgb25lIE5GUyBtb3VudCBtb3VudGVkIGF0
IHRoZSB0aW1lLg0KPiA+Pg0KPiA+PiAJRGF2ZQ0KPiA+Pg0KPiA+PiBCVUc6IHVuYWJsZSB0byBo
YW5kbGUga2VybmVsIE5VTEwgcG9pbnRlciBkZXJlZmVyZW5jZSBhdCAwMDAwMDAwMDAwMDAwMDI4
DQo+ID4+IElQOiBbPGZmZmZmZmZmYTAxMTkxZGY+XSBzdmNfZGVzdHJveSsweDFmLzB4MTQwIFtz
dW5ycGNdDQo+ID4+IFBHRCAxNDM0YzQwNjcgUFVEIDE0NDk2NDA2NyBQTUQgMA0KPiA+PiBPb3Bz
OiAwMDAwIFsjMV0gUFJFRU1QVCBTTVANCj4gPj4gQ1BVIDQNCj4gPj4gTW9kdWxlcyBsaW5rZWQg
aW46IGlwNnRhYmxlX2ZpbHRlcigtKSBpcDZfdGFibGVzIG5mc2QgbmZzIGZzY2FjaGUgYXV0aF9y
cGNnc3MgbmZzX2FjbCBsb2NrZCBpcDZ0X1JFSkVDVCBuZl9jb25udHJhY2tfaXB2NiBuZl9kZWZy
YWdfaXB2Ng0KPiA+Pg0KPiA+PiBQaWQ6IDY5NDYsIGNvbW06IG50cGQgTm90IHRhaW50ZWQgMy40
LjArICMxMw0KPiA+PiBSSVA6IDAwMTA6WzxmZmZmZmZmZmEwMTE5MWRmPl0gIFs8ZmZmZmZmZmZh
MDExOTFkZj5dIHN2Y19kZXN0cm95KzB4MWYvMHgxNDAgW3N1bnJwY10NCj4gPj4gUlNQOiAwMDE4
OmZmZmY4ODAxNDNjNjVjNDggIEVGTEFHUzogMDAwMTAyODYNCj4gPj4gUkFYOiAwMDAwMDAwMDAw
MDAwMDAwIFJCWDogZmZmZjg4MDE0MmNkNDFhMCBSQ1g6IDAwMDAwMDAwMDAwMDAwMDYNCj4gPj4g
UkRYOiAwMDAwMDAwMDAwMDAwMDQwIFJTSTogZmZmZjg4MDE0MzEwNTAyOCBSREk6IGZmZmY4ODAx
NDJjZDQxYTANCj4gPj4gUkJQOiBmZmZmODgwMTQzYzY1YzU4IFIwODogMDAwMDAwMDAwMDAwMDAw
MCBSMDk6IDAwMDAwMDAwMDAwMDAwMDENCj4gPj4gUjEwOiAwMDAwMDAwMDAwMDAwMDAwIFIxMTog
MDAwMDAwMDAwMDAwMDAwMCBSMTI6IGZmZmY4ODAxM2JjNWExNDgNCj4gPj4gUjEzOiBmZmZmODgw
MTQwOTgxNjU4IFIxNDogZmZmZjg4MDE0MmNkNDFhMCBSMTU6IGZmZmY4ODAxNDZjODgwMDANCj4g
Pj4gRlM6ICAwMDAwN2ZkYzAzODJhNzQwKDAwMDApIEdTOmZmZmY4ODAxNDk0MDAwMDAoMDAwMCkg
a25sR1M6MDAwMDAwMDAwMDAwMDAwMA0KPiA+PiBDUzogIDAwMTAgRFM6IDAwMDAgRVM6IDAwMDAg
Q1IwOiAwMDAwMDAwMDgwMDUwMDMzDQo+ID4+IENSMjogMDAwMDAwMDAwMDAwMDAyOCBDUjM6IDAw
MDAwMDAwMzZjYmIwMDAgQ1I0OiAwMDAwMDAwMDAwMTQwN2UwDQo+ID4+IERSMDogMDAwMDAwMDAw
MDAwMDAwMCBEUjE6IDAwMDAwMDAwMDAwMDAwMDAgRFIyOiAwMDAwMDAwMDAwMDAwMDAwDQo+ID4+
IERSMzogMDAwMDAwMDAwMDAwMDAwMCBEUjY6IDAwMDAwMDAwZmZmZjBmZjAgRFI3OiAwMDAwMDAw
MDAwMDAwNDAwDQo+ID4+IFByb2Nlc3MgbnRwZCAocGlkOiA2OTQ2LCB0aHJlYWRpbmZvIGZmZmY4
ODAxNDNjNjQwMDAsIHRhc2sgZmZmZjg4MDE0MzEwNDk0MCkNCj4gPj4gU3RhY2s6DQo+ID4+ICAg
ZmZmZjg4MDE0MDk4MTY2MCBmZmZmODgwMTNiYzVhMTQ4IGZmZmY4ODAxNDNjNjVjODggZmZmZmZm
ZmZhMDExOTNhNg0KPiA+PiAgIDAwMDAwMDAwMDAwMDAwMDAgZmZmZjg4MDEzZTU2NjAyMCBmZmZm
ODgwMTNlNTY1ZjI4IGZmZmY4ODAxNDZlZTZhYzANCj4gPj4gICBmZmZmODgwMTQzYzY1Y2E4IGZm
ZmZmZmZmYTAyNGY0MDMgZmZmZjg4MDE0M2M2NWNhOCBmZmZmODgwMTQzZDNhNGY4DQo+ID4+IENh
bGwgVHJhY2U6DQo+ID4+ICAgWzxmZmZmZmZmZmEwMTE5M2E2Pl0gc3ZjX2V4aXRfdGhyZWFkKzB4
YTYvMHhiMCBbc3VucnBjXQ0KPiA+PiAgIFs8ZmZmZmZmZmZhMDI0ZjQwMz5dIG5mc19jYWxsYmFj
a19kb3duKzB4NTMvMHg5MCBbbmZzXQ0KPiA+PiAgIFs8ZmZmZmZmZmZhMDIxNjQyZT5dIG5mc19m
cmVlX2NsaWVudCsweGZlLzB4MTIwIFtuZnNdDQo+ID4+ICAgWzxmZmZmZmZmZmEwMjE4NWRmPl0g
bmZzX3B1dF9jbGllbnQrMHgyOWYvMHg0MjAgW25mc10NCj4gPj4gICBbPGZmZmZmZmZmYTAyMTg0
ZTA+XSA/IG5mc19wdXRfY2xpZW50KzB4MWEwLzB4NDIwIFtuZnNdDQo+ID4+ICAgWzxmZmZmZmZm
ZmEwMjE5NjJmPl0gbmZzX2ZyZWVfc2VydmVyKzB4MTZmLzB4MmUwIFtuZnNdDQo+ID4+ICAgWzxm
ZmZmZmZmZmEwMjE5NGUzPl0gPyBuZnNfZnJlZV9zZXJ2ZXIrMHgyMy8weDJlMCBbbmZzXQ0KPiA+
PiAgIFs8ZmZmZmZmZmZhMDIyMzYzYz5dIG5mczRfa2lsbF9zdXBlcisweDNjLzB4NTAgW25mc10N
Cj4gPj4gICBbPGZmZmZmZmZmODExYWQ2N2M+XSBkZWFjdGl2YXRlX2xvY2tlZF9zdXBlcisweDNj
LzB4YTANCj4gPj4gICBbPGZmZmZmZmZmODExYWUyOWU+XSBkZWFjdGl2YXRlX3N1cGVyKzB4NGUv
MHg3MA0KPiA+PiAgIFs8ZmZmZmZmZmY4MTFjY2JhND5dIG1udHB1dF9ub19leHBpcmUrMHhiNC8w
eDEwMA0KPiA+PiAgIFs8ZmZmZmZmZmY4MTFjY2MxNj5dIG1udHB1dCsweDI2LzB4NDANCj4gPj4g
ICBbPGZmZmZmZmZmODExY2Q1OTc+XSByZWxlYXNlX21vdW50cysweDc3LzB4OTANCj4gPj4gICBb
PGZmZmZmZmZmODExY2VmYzY+XSBwdXRfbW50X25zKzB4NjYvMHg4MA0KPiA+PiAgIFs8ZmZmZmZm
ZmY4MTA3OGRmZj5dIGZyZWVfbnNwcm94eSsweDFmLzB4YjANCj4gPj4gICBbPGZmZmZmZmZmODEw
NzkwNWU+XSBzd2l0Y2hfdGFza19uYW1lc3BhY2VzKzB4NWUvMHg3MA0KPiA+PiAgIFs8ZmZmZmZm
ZmY4MTA3OTA4MD5dIGV4aXRfdGFza19uYW1lc3BhY2VzKzB4MTAvMHgyMA0KPiA+PiAgIFs8ZmZm
ZmZmZmY4MTA0ZTkwZT5dIGRvX2V4aXQrMHg0ZWUvMHhiODANCj4gPj4gICBbPGZmZmZmZmZmODE2
MzljMGE+XSA/IHJldGludF9zd2FwZ3MrMHhlLzB4MTMNCj4gPj4gICBbPGZmZmZmZmZmODEwNGYy
ZWY+XSBkb19ncm91cF9leGl0KzB4NGYvMHhjMA0KPiA+PiAgIFs8ZmZmZmZmZmY4MTA0ZjM3Nz5d
IHN5c19leGl0X2dyb3VwKzB4MTcvMHgyMA0KPiA+PiAgIFs8ZmZmZmZmZmY4MTY0MTM1Mj5dIHN5
c3RlbV9jYWxsX2Zhc3RwYXRoKzB4MTYvMHgxYg0KPiA+PiBDb2RlOiA0OCA4YiA1ZCBmMCA0YyA4
YiA2NSBmOCBjOSBjMyA2NiA5MCA1NSA0OCA4OSBlNSA0MSA1NCA1MyA2NiA2NiA2NiA2NiA5MCA2
NSA0OCA4YiAwNCAyNSA4MCBiYSAwMCAwMCA0OCA4YiA4MCA1MCAwNSAwMCAwMCA0OCA4OSBmYjw0
Yz4gIDhiIDYwIDI4IDhiIDQ3IDU4IDg1IGMwIDBmIDg0IGVjIDAwIDAwIDAwIDgzIGU4IDAxIDg1
IGMwIDg5DQo+ID4NCj4gPiBBc2lkZSBmcm9tIHRoZSBmYWN0IHRoYXQgdGhlIGN1cnJlbnQgbmV0
X25hbWVzcGFjZSBpcyBub3QgZ3VhcmFudGVlZCB0bw0KPiA+IGV4aXN0IHdoZW4gd2UgYXJlIGNh
bGxlZCBmcm9tIGZyZWVfbnNwcm94eSwgc3ZjX2Rlc3Ryb3koKSBsb29rcw0KPiA+IHNlcmlvdXNs
eSBicm9rZW46DQo+IA0KPiBUcm9uZCwgbG9va3MgbGlrZSB5b3UgYXJlIG1pc3Rha2VuIGhlcmUu
DQo+IEFueSBwcm9jZXNzIGhvbGRzIHJlZmVyZW5jZXMgdG8gYWxsIG5hbWVzcGFjZXMgaXQgYmVs
b25nIHRvIChjb3B5X25ldF9ucygpIA0KPiBpbmNyZWFzZSB1c2FnZSBjb3VudGVyKS4gQW5kIG5l
dHdvcmsgbmFtZXNwYWNlIGlzIHJlbGVhc2VkIGFmdGVyIG1vdW50IG5hbWVzcGFjZSANCj4gaW4g
ZnJlZV9uc3Byb3h5Lg0KDQpUaGF0IGRvZXNuJ3QgaGVscCB5b3UgdGhvdWdoLiBzd2l0Y2hfdGFz
a19uYW1lc3BhY2VzIHdpbGwgaGF2ZSBhbHJlYWR5DQpzZXQgY3VycmVudC0+bnNwcm94eSB0byBO
VUxMLCB3aGljaCBpcyB3aHkgd2UgT29wcyB3aGVuIHdlIHRyeSB0byByZWFkDQpjdXJyZW50LT5u
c3Byb3h5LT5uZXRfbnMgaW4gc3ZjX2V4aXRfdGhyZWFkKCkuDQoNCj4gPg0KPiA+ICAgICAgICAq
IE9uIHRoZSBvbmUgaGFuZCBpdCBpcyB0cnlpbmcgdG8gZnJlZSBzdHJ1Y3Qgc3ZjX3NlcnYgKGFu
ZA0KPiA+ICAgICAgICAgIHByZXN1bWFibHkgYWxsIHN0cnVjdHVyZXMgb3duZWQgYnkgc3RydWN0
IHN2Y19zZXJ2KS4NCj4gPiAgICAgICAgKiBPbiB0aGUgb3RoZXIgaGFuZCwgaXQgdHJpZXMgdG8g
cGFzcyBhIHBhcmFtZXRlciB0bw0KPiA+ICAgICAgICAgIHN2Y19jbG9zZV9uZXQoKSBzYXlpbmcg
InBsZWFzZSBkb24ndCBmcmVlIHN0cnVjdHVyZXMgb24gbXkNCj4gPiAgICAgICAgICBzdl90ZW1w
c29ja3MsIG9yIHN2X3Blcm1zb2NrcyBsaXN0IHVubGVzcyB0aGV5IG1hdGNoIHRoaXMgbmV0DQo+
ID4gICAgICAgICAgbmFtZXNwYWNlIi4NCj4gPg0KPiANCj4gSSd2ZSBzZW50IHBhdGNoZXMsIHdo
aWNoIG1vdmVzIHN2Y19zaHV0ZG93bl9uZXQoKSBmcm9tIHN2Y19kZXN0cm95KCkgKCJTVU5SUEM6
IA0KPiBzZXBhcmF0ZSBwZXItbmV0IGRhdGEgY3JlYXRpb24gZnJvbSBzZXJ2aWNlIikuDQo+IHdp
dGggdGhpcyBwYXRjaCBzZXQgaXQncyBhc3N1bWVkLCB0aGF0IHBlci1uZXQgcmVzb3VyY2VzIHdp
bGwgYmUgY3JlYXRlZCBvciANCj4gcmVsZWFzZWQgcHJpb3IgdG8gc2VydmljZSBjcmVhdGlvbiBh
bmQgZGVzdHJ1Y3Rpb24uDQoNCkFyZSB0aG9zZSBwYXRjaGVzIGFwcHJvcHJpYXRlIGZvciBpbmNs
dXNpb24gaW4gdGhlIHN0YWJsZSBrZXJuZWwgc2VyaWVzDQpzbyB0aGF0IHdlIGNhbiBmaXggMy40
Pw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoN
Ck5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-25 13:07       ` Myklebust, Trond
  (?)
@ 2012-05-25 13:31       ` Stanislav Kinsbursky
  2012-05-28 23:43           ` Myklebust, Trond
  -1 siblings, 1 reply; 16+ messages in thread
From: Stanislav Kinsbursky @ 2012-05-25 13:31 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

On 25.05.2012 17:07, Myklebust, Trond wrote:
> On Fri, 2012-05-25 at 12:12 +0400, Stanislav Kinsbursky wrote:
>> On 21.05.2012 22:03, Myklebust, Trond wrote:
>>> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
>>>> Tried to shutdown a machine, got this, and a bunch of hung processes.
>>>> There was one NFS mount mounted at the time.
>>>>
>>>> 	Dave
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>>>> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>>>> PGD 1434c4067 PUD 144964067 PMD 0
>>>> Oops: 0000 [#1] PREEMPT SMP
>>>> CPU 4
>>>> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
>>>>
>>>> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13
>>>> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>>>> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
>>>> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
>>>> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
>>>> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
>>>> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
>>>> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
>>>> Stack:
>>>>    ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
>>>>    0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
>>>>    ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
>>>> Call Trace:
>>>>    [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>>>    [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
>>>>    [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
>>>>    [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
>>>>    [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
>>>>    [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
>>>>    [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
>>>>    [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
>>>>    [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
>>>>    [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
>>>>    [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
>>>>    [<ffffffff811ccc16>] mntput+0x26/0x40
>>>>    [<ffffffff811cd597>] release_mounts+0x77/0x90
>>>>    [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
>>>>    [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
>>>>    [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
>>>>    [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
>>>>    [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
>>>>    [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
>>>>    [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
>>>>    [<ffffffff8104f377>] sys_exit_group+0x17/0x20
>>>>    [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
>>>> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb<4c>   8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89
>>>
>>> Aside from the fact that the current net_namespace is not guaranteed to
>>> exist when we are called from free_nsproxy, svc_destroy() looks
>>> seriously broken:
>>
>> Trond, looks like you are mistaken here.
>> Any process holds references to all namespaces it belong to (copy_net_ns()
>> increase usage counter). And network namespace is released after mount namespace
>> in free_nsproxy.
>
> That doesn't help you though. switch_task_namespaces will have already
> set current->nsproxy to NULL, which is why we Oops when we try to read
> current->nsproxy->net_ns in svc_exit_thread().
>
>>>
>>>         * On the one hand it is trying to free struct svc_serv (and
>>>           presumably all structures owned by struct svc_serv).
>>>         * On the other hand, it tries to pass a parameter to
>>>           svc_close_net() saying "please don't free structures on my
>>>           sv_tempsocks, or sv_permsocks list unless they match this net
>>>           namespace".
>>>
>>
>> I've sent patches, which moves svc_shutdown_net() from svc_destroy() ("SUNRPC:
>> separate per-net data creation from service").
>> with this patch set it's assumed, that per-net resources will be created or
>> released prior to service creation and destruction.
>
> Are those patches appropriate for inclusion in the stable kernel series
> so that we can fix 3.4?
>

Yes. But unfortunately, this won't be enough.
"NFS: callback threads containerization" patch set is required as well.

A a bugfix, I can suggest "SUNRPC: separate per-net data creation from service" 
patch set + pass hard-coded "init_net" for NFS callback shutdown routines 
(instead of current->nsproxy->net_ns). This should work.



-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-25 13:31       ` Stanislav Kinsbursky
@ 2012-05-28 23:43           ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-28 23:43 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5964 bytes --]

On Fri, 2012-05-25 at 17:31 +0400, Stanislav Kinsbursky wrote:
> On 25.05.2012 17:07, Myklebust, Trond wrote:
> > On Fri, 2012-05-25 at 12:12 +0400, Stanislav Kinsbursky wrote:
> >> On 21.05.2012 22:03, Myklebust, Trond wrote:
> >>> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
> >>>> Tried to shutdown a machine, got this, and a bunch of hung processes.
> >>>> There was one NFS mount mounted at the time.
> >>>>
> >>>> 	Dave
> >>>>
> >>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> >>>> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> >>>> PGD 1434c4067 PUD 144964067 PMD 0
> >>>> Oops: 0000 [#1] PREEMPT SMP
> >>>> CPU 4
> >>>> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> >>>>
> >>>> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13
> >>>> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
> >>>> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
> >>>> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
> >>>> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
> >>>> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
> >>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
> >>>> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
> >>>> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
> >>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
> >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
> >>>> Stack:
> >>>>    ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
> >>>>    0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
> >>>>    ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
> >>>> Call Trace:
> >>>>    [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >>>>    [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
> >>>>    [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
> >>>>    [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
> >>>>    [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
> >>>>    [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
> >>>>    [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
> >>>>    [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
> >>>>    [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
> >>>>    [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
> >>>>    [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
> >>>>    [<ffffffff811ccc16>] mntput+0x26/0x40
> >>>>    [<ffffffff811cd597>] release_mounts+0x77/0x90
> >>>>    [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
> >>>>    [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
> >>>>    [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
> >>>>    [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
> >>>>    [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
> >>>>    [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
> >>>>    [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
> >>>>    [<ffffffff8104f377>] sys_exit_group+0x17/0x20
> >>>>    [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
> >>>> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb<4c>   8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89
> >>>
> >>> Aside from the fact that the current net_namespace is not guaranteed to
> >>> exist when we are called from free_nsproxy, svc_destroy() looks
> >>> seriously broken:
> >>
> >> Trond, looks like you are mistaken here.
> >> Any process holds references to all namespaces it belong to (copy_net_ns()
> >> increase usage counter). And network namespace is released after mount namespace
> >> in free_nsproxy.
> >
> > That doesn't help you though. switch_task_namespaces will have already
> > set current->nsproxy to NULL, which is why we Oops when we try to read
> > current->nsproxy->net_ns in svc_exit_thread().
> >
> >>>
> >>>         * On the one hand it is trying to free struct svc_serv (and
> >>>           presumably all structures owned by struct svc_serv).
> >>>         * On the other hand, it tries to pass a parameter to
> >>>           svc_close_net() saying "please don't free structures on my
> >>>           sv_tempsocks, or sv_permsocks list unless they match this net
> >>>           namespace".
> >>>
> >>
> >> I've sent patches, which moves svc_shutdown_net() from svc_destroy() ("SUNRPC:
> >> separate per-net data creation from service").
> >> with this patch set it's assumed, that per-net resources will be created or
> >> released prior to service creation and destruction.
> >
> > Are those patches appropriate for inclusion in the stable kernel series
> > so that we can fix 3.4?
> >
> 
> Yes. But unfortunately, this won't be enough.
> "NFS: callback threads containerization" patch set is required as well.
> 
> A a bugfix, I can suggest "SUNRPC: separate per-net data creation from service" 
> patch set + pass hard-coded "init_net" for NFS callback shutdown routines 
> (instead of current->nsproxy->net_ns). This should work.

Hi Stanislav,

My question is why should svc_destroy() care about net namespaces at
all? Once an application is calling svc_destroy(), it is trying to close
down the entire service. It really should not matter to which net
namespace a particular socket belongs: they _all_ need to be destroyed.

Cheers,
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
@ 2012-05-28 23:43           ` Myklebust, Trond
  0 siblings, 0 replies; 16+ messages in thread
From: Myklebust, Trond @ 2012-05-28 23:43 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

T24gRnJpLCAyMDEyLTA1LTI1IGF0IDE3OjMxICswNDAwLCBTdGFuaXNsYXYgS2luc2J1cnNreSB3
cm90ZToNCj4gT24gMjUuMDUuMjAxMiAxNzowNywgTXlrbGVidXN0LCBUcm9uZCB3cm90ZToNCj4g
PiBPbiBGcmksIDIwMTItMDUtMjUgYXQgMTI6MTIgKzA0MDAsIFN0YW5pc2xhdiBLaW5zYnVyc2t5
IHdyb3RlOg0KPiA+PiBPbiAyMS4wNS4yMDEyIDIyOjAzLCBNeWtsZWJ1c3QsIFRyb25kIHdyb3Rl
Og0KPiA+Pj4gT24gTW9uLCAyMDEyLTA1LTIxIGF0IDEzOjE0IC0wNDAwLCBEYXZlIEpvbmVzIHdy
b3RlOg0KPiA+Pj4+IFRyaWVkIHRvIHNodXRkb3duIGEgbWFjaGluZSwgZ290IHRoaXMsIGFuZCBh
IGJ1bmNoIG9mIGh1bmcgcHJvY2Vzc2VzLg0KPiA+Pj4+IFRoZXJlIHdhcyBvbmUgTkZTIG1vdW50
IG1vdW50ZWQgYXQgdGhlIHRpbWUuDQo+ID4+Pj4NCj4gPj4+PiAJRGF2ZQ0KPiA+Pj4+DQo+ID4+
Pj4gQlVHOiB1bmFibGUgdG8gaGFuZGxlIGtlcm5lbCBOVUxMIHBvaW50ZXIgZGVyZWZlcmVuY2Ug
YXQgMDAwMDAwMDAwMDAwMDAyOA0KPiA+Pj4+IElQOiBbPGZmZmZmZmZmYTAxMTkxZGY+XSBzdmNf
ZGVzdHJveSsweDFmLzB4MTQwIFtzdW5ycGNdDQo+ID4+Pj4gUEdEIDE0MzRjNDA2NyBQVUQgMTQ0
OTY0MDY3IFBNRCAwDQo+ID4+Pj4gT29wczogMDAwMCBbIzFdIFBSRUVNUFQgU01QDQo+ID4+Pj4g
Q1BVIDQNCj4gPj4+PiBNb2R1bGVzIGxpbmtlZCBpbjogaXA2dGFibGVfZmlsdGVyKC0pIGlwNl90
YWJsZXMgbmZzZCBuZnMgZnNjYWNoZSBhdXRoX3JwY2dzcyBuZnNfYWNsIGxvY2tkIGlwNnRfUkVK
RUNUIG5mX2Nvbm50cmFja19pcHY2IG5mX2RlZnJhZ19pcHY2DQo+ID4+Pj4NCj4gPj4+PiBQaWQ6
IDY5NDYsIGNvbW06IG50cGQgTm90IHRhaW50ZWQgMy40LjArICMxMw0KPiA+Pj4+IFJJUDogMDAx
MDpbPGZmZmZmZmZmYTAxMTkxZGY+XSAgWzxmZmZmZmZmZmEwMTE5MWRmPl0gc3ZjX2Rlc3Ryb3kr
MHgxZi8weDE0MCBbc3VucnBjXQ0KPiA+Pj4+IFJTUDogMDAxODpmZmZmODgwMTQzYzY1YzQ4ICBF
RkxBR1M6IDAwMDEwMjg2DQo+ID4+Pj4gUkFYOiAwMDAwMDAwMDAwMDAwMDAwIFJCWDogZmZmZjg4
MDE0MmNkNDFhMCBSQ1g6IDAwMDAwMDAwMDAwMDAwMDYNCj4gPj4+PiBSRFg6IDAwMDAwMDAwMDAw
MDAwNDAgUlNJOiBmZmZmODgwMTQzMTA1MDI4IFJESTogZmZmZjg4MDE0MmNkNDFhMA0KPiA+Pj4+
IFJCUDogZmZmZjg4MDE0M2M2NWM1OCBSMDg6IDAwMDAwMDAwMDAwMDAwMDAgUjA5OiAwMDAwMDAw
MDAwMDAwMDAxDQo+ID4+Pj4gUjEwOiAwMDAwMDAwMDAwMDAwMDAwIFIxMTogMDAwMDAwMDAwMDAw
MDAwMCBSMTI6IGZmZmY4ODAxM2JjNWExNDgNCj4gPj4+PiBSMTM6IGZmZmY4ODAxNDA5ODE2NTgg
UjE0OiBmZmZmODgwMTQyY2Q0MWEwIFIxNTogZmZmZjg4MDE0NmM4ODAwMA0KPiA+Pj4+IEZTOiAg
MDAwMDdmZGMwMzgyYTc0MCgwMDAwKSBHUzpmZmZmODgwMTQ5NDAwMDAwKDAwMDApIGtubEdTOjAw
MDAwMDAwMDAwMDAwMDANCj4gPj4+PiBDUzogIDAwMTAgRFM6IDAwMDAgRVM6IDAwMDAgQ1IwOiAw
MDAwMDAwMDgwMDUwMDMzDQo+ID4+Pj4gQ1IyOiAwMDAwMDAwMDAwMDAwMDI4IENSMzogMDAwMDAw
MDAzNmNiYjAwMCBDUjQ6IDAwMDAwMDAwMDAxNDA3ZTANCj4gPj4+PiBEUjA6IDAwMDAwMDAwMDAw
MDAwMDAgRFIxOiAwMDAwMDAwMDAwMDAwMDAwIERSMjogMDAwMDAwMDAwMDAwMDAwMA0KPiA+Pj4+
IERSMzogMDAwMDAwMDAwMDAwMDAwMCBEUjY6IDAwMDAwMDAwZmZmZjBmZjAgRFI3OiAwMDAwMDAw
MDAwMDAwNDAwDQo+ID4+Pj4gUHJvY2VzcyBudHBkIChwaWQ6IDY5NDYsIHRocmVhZGluZm8gZmZm
Zjg4MDE0M2M2NDAwMCwgdGFzayBmZmZmODgwMTQzMTA0OTQwKQ0KPiA+Pj4+IFN0YWNrOg0KPiA+
Pj4+ICAgIGZmZmY4ODAxNDA5ODE2NjAgZmZmZjg4MDEzYmM1YTE0OCBmZmZmODgwMTQzYzY1Yzg4
IGZmZmZmZmZmYTAxMTkzYTYNCj4gPj4+PiAgICAwMDAwMDAwMDAwMDAwMDAwIGZmZmY4ODAxM2U1
NjYwMjAgZmZmZjg4MDEzZTU2NWYyOCBmZmZmODgwMTQ2ZWU2YWMwDQo+ID4+Pj4gICAgZmZmZjg4
MDE0M2M2NWNhOCBmZmZmZmZmZmEwMjRmNDAzIGZmZmY4ODAxNDNjNjVjYTggZmZmZjg4MDE0M2Qz
YTRmOA0KPiA+Pj4+IENhbGwgVHJhY2U6DQo+ID4+Pj4gICAgWzxmZmZmZmZmZmEwMTE5M2E2Pl0g
c3ZjX2V4aXRfdGhyZWFkKzB4YTYvMHhiMCBbc3VucnBjXQ0KPiA+Pj4+ICAgIFs8ZmZmZmZmZmZh
MDI0ZjQwMz5dIG5mc19jYWxsYmFja19kb3duKzB4NTMvMHg5MCBbbmZzXQ0KPiA+Pj4+ICAgIFs8
ZmZmZmZmZmZhMDIxNjQyZT5dIG5mc19mcmVlX2NsaWVudCsweGZlLzB4MTIwIFtuZnNdDQo+ID4+
Pj4gICAgWzxmZmZmZmZmZmEwMjE4NWRmPl0gbmZzX3B1dF9jbGllbnQrMHgyOWYvMHg0MjAgW25m
c10NCj4gPj4+PiAgICBbPGZmZmZmZmZmYTAyMTg0ZTA+XSA/IG5mc19wdXRfY2xpZW50KzB4MWEw
LzB4NDIwIFtuZnNdDQo+ID4+Pj4gICAgWzxmZmZmZmZmZmEwMjE5NjJmPl0gbmZzX2ZyZWVfc2Vy
dmVyKzB4MTZmLzB4MmUwIFtuZnNdDQo+ID4+Pj4gICAgWzxmZmZmZmZmZmEwMjE5NGUzPl0gPyBu
ZnNfZnJlZV9zZXJ2ZXIrMHgyMy8weDJlMCBbbmZzXQ0KPiA+Pj4+ICAgIFs8ZmZmZmZmZmZhMDIy
MzYzYz5dIG5mczRfa2lsbF9zdXBlcisweDNjLzB4NTAgW25mc10NCj4gPj4+PiAgICBbPGZmZmZm
ZmZmODExYWQ2N2M+XSBkZWFjdGl2YXRlX2xvY2tlZF9zdXBlcisweDNjLzB4YTANCj4gPj4+PiAg
ICBbPGZmZmZmZmZmODExYWUyOWU+XSBkZWFjdGl2YXRlX3N1cGVyKzB4NGUvMHg3MA0KPiA+Pj4+
ICAgIFs8ZmZmZmZmZmY4MTFjY2JhND5dIG1udHB1dF9ub19leHBpcmUrMHhiNC8weDEwMA0KPiA+
Pj4+ICAgIFs8ZmZmZmZmZmY4MTFjY2MxNj5dIG1udHB1dCsweDI2LzB4NDANCj4gPj4+PiAgICBb
PGZmZmZmZmZmODExY2Q1OTc+XSByZWxlYXNlX21vdW50cysweDc3LzB4OTANCj4gPj4+PiAgICBb
PGZmZmZmZmZmODExY2VmYzY+XSBwdXRfbW50X25zKzB4NjYvMHg4MA0KPiA+Pj4+ICAgIFs8ZmZm
ZmZmZmY4MTA3OGRmZj5dIGZyZWVfbnNwcm94eSsweDFmLzB4YjANCj4gPj4+PiAgICBbPGZmZmZm
ZmZmODEwNzkwNWU+XSBzd2l0Y2hfdGFza19uYW1lc3BhY2VzKzB4NWUvMHg3MA0KPiA+Pj4+ICAg
IFs8ZmZmZmZmZmY4MTA3OTA4MD5dIGV4aXRfdGFza19uYW1lc3BhY2VzKzB4MTAvMHgyMA0KPiA+
Pj4+ICAgIFs8ZmZmZmZmZmY4MTA0ZTkwZT5dIGRvX2V4aXQrMHg0ZWUvMHhiODANCj4gPj4+PiAg
ICBbPGZmZmZmZmZmODE2MzljMGE+XSA/IHJldGludF9zd2FwZ3MrMHhlLzB4MTMNCj4gPj4+PiAg
ICBbPGZmZmZmZmZmODEwNGYyZWY+XSBkb19ncm91cF9leGl0KzB4NGYvMHhjMA0KPiA+Pj4+ICAg
IFs8ZmZmZmZmZmY4MTA0ZjM3Nz5dIHN5c19leGl0X2dyb3VwKzB4MTcvMHgyMA0KPiA+Pj4+ICAg
IFs8ZmZmZmZmZmY4MTY0MTM1Mj5dIHN5c3RlbV9jYWxsX2Zhc3RwYXRoKzB4MTYvMHgxYg0KPiA+
Pj4+IENvZGU6IDQ4IDhiIDVkIGYwIDRjIDhiIDY1IGY4IGM5IGMzIDY2IDkwIDU1IDQ4IDg5IGU1
IDQxIDU0IDUzIDY2IDY2IDY2IDY2IDkwIDY1IDQ4IDhiIDA0IDI1IDgwIGJhIDAwIDAwIDQ4IDhi
IDgwIDUwIDA1IDAwIDAwIDQ4IDg5IGZiPDRjPiAgIDhiIDYwIDI4IDhiIDQ3IDU4IDg1IGMwIDBm
IDg0IGVjIDAwIDAwIDAwIDgzIGU4IDAxIDg1IGMwIDg5DQo+ID4+Pg0KPiA+Pj4gQXNpZGUgZnJv
bSB0aGUgZmFjdCB0aGF0IHRoZSBjdXJyZW50IG5ldF9uYW1lc3BhY2UgaXMgbm90IGd1YXJhbnRl
ZWQgdG8NCj4gPj4+IGV4aXN0IHdoZW4gd2UgYXJlIGNhbGxlZCBmcm9tIGZyZWVfbnNwcm94eSwg
c3ZjX2Rlc3Ryb3koKSBsb29rcw0KPiA+Pj4gc2VyaW91c2x5IGJyb2tlbjoNCj4gPj4NCj4gPj4g
VHJvbmQsIGxvb2tzIGxpa2UgeW91IGFyZSBtaXN0YWtlbiBoZXJlLg0KPiA+PiBBbnkgcHJvY2Vz
cyBob2xkcyByZWZlcmVuY2VzIHRvIGFsbCBuYW1lc3BhY2VzIGl0IGJlbG9uZyB0byAoY29weV9u
ZXRfbnMoKQ0KPiA+PiBpbmNyZWFzZSB1c2FnZSBjb3VudGVyKS4gQW5kIG5ldHdvcmsgbmFtZXNw
YWNlIGlzIHJlbGVhc2VkIGFmdGVyIG1vdW50IG5hbWVzcGFjZQ0KPiA+PiBpbiBmcmVlX25zcHJv
eHkuDQo+ID4NCj4gPiBUaGF0IGRvZXNuJ3QgaGVscCB5b3UgdGhvdWdoLiBzd2l0Y2hfdGFza19u
YW1lc3BhY2VzIHdpbGwgaGF2ZSBhbHJlYWR5DQo+ID4gc2V0IGN1cnJlbnQtPm5zcHJveHkgdG8g
TlVMTCwgd2hpY2ggaXMgd2h5IHdlIE9vcHMgd2hlbiB3ZSB0cnkgdG8gcmVhZA0KPiA+IGN1cnJl
bnQtPm5zcHJveHktPm5ldF9ucyBpbiBzdmNfZXhpdF90aHJlYWQoKS4NCj4gPg0KPiA+Pj4NCj4g
Pj4+ICAgICAgICAgKiBPbiB0aGUgb25lIGhhbmQgaXQgaXMgdHJ5aW5nIHRvIGZyZWUgc3RydWN0
IHN2Y19zZXJ2IChhbmQNCj4gPj4+ICAgICAgICAgICBwcmVzdW1hYmx5IGFsbCBzdHJ1Y3R1cmVz
IG93bmVkIGJ5IHN0cnVjdCBzdmNfc2VydikuDQo+ID4+PiAgICAgICAgICogT24gdGhlIG90aGVy
IGhhbmQsIGl0IHRyaWVzIHRvIHBhc3MgYSBwYXJhbWV0ZXIgdG8NCj4gPj4+ICAgICAgICAgICBz
dmNfY2xvc2VfbmV0KCkgc2F5aW5nICJwbGVhc2UgZG9uJ3QgZnJlZSBzdHJ1Y3R1cmVzIG9uIG15
DQo+ID4+PiAgICAgICAgICAgc3ZfdGVtcHNvY2tzLCBvciBzdl9wZXJtc29ja3MgbGlzdCB1bmxl
c3MgdGhleSBtYXRjaCB0aGlzIG5ldA0KPiA+Pj4gICAgICAgICAgIG5hbWVzcGFjZSIuDQo+ID4+
Pg0KPiA+Pg0KPiA+PiBJJ3ZlIHNlbnQgcGF0Y2hlcywgd2hpY2ggbW92ZXMgc3ZjX3NodXRkb3du
X25ldCgpIGZyb20gc3ZjX2Rlc3Ryb3koKSAoIlNVTlJQQzoNCj4gPj4gc2VwYXJhdGUgcGVyLW5l
dCBkYXRhIGNyZWF0aW9uIGZyb20gc2VydmljZSIpLg0KPiA+PiB3aXRoIHRoaXMgcGF0Y2ggc2V0
IGl0J3MgYXNzdW1lZCwgdGhhdCBwZXItbmV0IHJlc291cmNlcyB3aWxsIGJlIGNyZWF0ZWQgb3IN
Cj4gPj4gcmVsZWFzZWQgcHJpb3IgdG8gc2VydmljZSBjcmVhdGlvbiBhbmQgZGVzdHJ1Y3Rpb24u
DQo+ID4NCj4gPiBBcmUgdGhvc2UgcGF0Y2hlcyBhcHByb3ByaWF0ZSBmb3IgaW5jbHVzaW9uIGlu
IHRoZSBzdGFibGUga2VybmVsIHNlcmllcw0KPiA+IHNvIHRoYXQgd2UgY2FuIGZpeCAzLjQ/DQo+
ID4NCj4gDQo+IFllcy4gQnV0IHVuZm9ydHVuYXRlbHksIHRoaXMgd29uJ3QgYmUgZW5vdWdoLg0K
PiAiTkZTOiBjYWxsYmFjayB0aHJlYWRzIGNvbnRhaW5lcml6YXRpb24iIHBhdGNoIHNldCBpcyBy
ZXF1aXJlZCBhcyB3ZWxsLg0KPiANCj4gQSBhIGJ1Z2ZpeCwgSSBjYW4gc3VnZ2VzdCAiU1VOUlBD
OiBzZXBhcmF0ZSBwZXItbmV0IGRhdGEgY3JlYXRpb24gZnJvbSBzZXJ2aWNlIiANCj4gcGF0Y2gg
c2V0ICsgcGFzcyBoYXJkLWNvZGVkICJpbml0X25ldCIgZm9yIE5GUyBjYWxsYmFjayBzaHV0ZG93
biByb3V0aW5lcyANCj4gKGluc3RlYWQgb2YgY3VycmVudC0+bnNwcm94eS0+bmV0X25zKS4gVGhp
cyBzaG91bGQgd29yay4NCg0KSGkgU3RhbmlzbGF2LA0KDQpNeSBxdWVzdGlvbiBpcyB3aHkgc2hv
dWxkIHN2Y19kZXN0cm95KCkgY2FyZSBhYm91dCBuZXQgbmFtZXNwYWNlcyBhdA0KYWxsPyBPbmNl
IGFuIGFwcGxpY2F0aW9uIGlzIGNhbGxpbmcgc3ZjX2Rlc3Ryb3koKSwgaXQgaXMgdHJ5aW5nIHRv
IGNsb3NlDQpkb3duIHRoZSBlbnRpcmUgc2VydmljZS4gSXQgcmVhbGx5IHNob3VsZCBub3QgbWF0
dGVyIHRvIHdoaWNoIG5ldA0KbmFtZXNwYWNlIGEgcGFydGljdWxhciBzb2NrZXQgYmVsb25nczog
dGhleSBfYWxsXyBuZWVkIHRvIGJlIGRlc3Ryb3llZC4NCg0KQ2hlZXJzLA0KICBUcm9uZA0KDQot
LSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFw
cA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-28 23:43           ` Myklebust, Trond
  (?)
@ 2012-05-29  8:48           ` Stanislav Kinsbursky
  -1 siblings, 0 replies; 16+ messages in thread
From: Stanislav Kinsbursky @ 2012-05-29  8:48 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Dave Jones, bfields, linux-nfs, Linux Kernel

On 29.05.2012 03:43, Myklebust, Trond wrote:
> On Fri, 2012-05-25 at 17:31 +0400, Stanislav Kinsbursky wrote:
>> On 25.05.2012 17:07, Myklebust, Trond wrote:
>>> On Fri, 2012-05-25 at 12:12 +0400, Stanislav Kinsbursky wrote:
>>>> On 21.05.2012 22:03, Myklebust, Trond wrote:
>>>>> On Mon, 2012-05-21 at 13:14 -0400, Dave Jones wrote:
>>>>>> Tried to shutdown a machine, got this, and a bunch of hung processes.
>>>>>> There was one NFS mount mounted at the time.
>>>>>>
>>>>>> 	Dave
>>>>>>
>>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>>>>>> IP: [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>>>>>> PGD 1434c4067 PUD 144964067 PMD 0
>>>>>> Oops: 0000 [#1] PREEMPT SMP
>>>>>> CPU 4
>>>>>> Modules linked in: ip6table_filter(-) ip6_tables nfsd nfs fscache auth_rpcgss nfs_acl lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
>>>>>>
>>>>>> Pid: 6946, comm: ntpd Not tainted 3.4.0+ #13
>>>>>> RIP: 0010:[<ffffffffa01191df>]  [<ffffffffa01191df>] svc_destroy+0x1f/0x140 [sunrpc]
>>>>>> RSP: 0018:ffff880143c65c48  EFLAGS: 00010286
>>>>>> RAX: 0000000000000000 RBX: ffff880142cd41a0 RCX: 0000000000000006
>>>>>> RDX: 0000000000000040 RSI: ffff880143105028 RDI: ffff880142cd41a0
>>>>>> RBP: ffff880143c65c58 R08: 0000000000000000 R09: 0000000000000001
>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88013bc5a148
>>>>>> R13: ffff880140981658 R14: ffff880142cd41a0 R15: ffff880146c88000
>>>>>> FS:  00007fdc0382a740(0000) GS:ffff880149400000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 0000000000000028 CR3: 0000000036cbb000 CR4: 00000000001407e0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>>> Process ntpd (pid: 6946, threadinfo ffff880143c64000, task ffff880143104940)
>>>>>> Stack:
>>>>>>     ffff880140981660 ffff88013bc5a148 ffff880143c65c88 ffffffffa01193a6
>>>>>>     0000000000000000 ffff88013e566020 ffff88013e565f28 ffff880146ee6ac0
>>>>>>     ffff880143c65ca8 ffffffffa024f403 ffff880143c65ca8 ffff880143d3a4f8
>>>>>> Call Trace:
>>>>>>     [<ffffffffa01193a6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>>>>>     [<ffffffffa024f403>] nfs_callback_down+0x53/0x90 [nfs]
>>>>>>     [<ffffffffa021642e>] nfs_free_client+0xfe/0x120 [nfs]
>>>>>>     [<ffffffffa02185df>] nfs_put_client+0x29f/0x420 [nfs]
>>>>>>     [<ffffffffa02184e0>] ? nfs_put_client+0x1a0/0x420 [nfs]
>>>>>>     [<ffffffffa021962f>] nfs_free_server+0x16f/0x2e0 [nfs]
>>>>>>     [<ffffffffa02194e3>] ? nfs_free_server+0x23/0x2e0 [nfs]
>>>>>>     [<ffffffffa022363c>] nfs4_kill_super+0x3c/0x50 [nfs]
>>>>>>     [<ffffffff811ad67c>] deactivate_locked_super+0x3c/0xa0
>>>>>>     [<ffffffff811ae29e>] deactivate_super+0x4e/0x70
>>>>>>     [<ffffffff811ccba4>] mntput_no_expire+0xb4/0x100
>>>>>>     [<ffffffff811ccc16>] mntput+0x26/0x40
>>>>>>     [<ffffffff811cd597>] release_mounts+0x77/0x90
>>>>>>     [<ffffffff811cefc6>] put_mnt_ns+0x66/0x80
>>>>>>     [<ffffffff81078dff>] free_nsproxy+0x1f/0xb0
>>>>>>     [<ffffffff8107905e>] switch_task_namespaces+0x5e/0x70
>>>>>>     [<ffffffff81079080>] exit_task_namespaces+0x10/0x20
>>>>>>     [<ffffffff8104e90e>] do_exit+0x4ee/0xb80
>>>>>>     [<ffffffff81639c0a>] ? retint_swapgs+0xe/0x13
>>>>>>     [<ffffffff8104f2ef>] do_group_exit+0x4f/0xc0
>>>>>>     [<ffffffff8104f377>] sys_exit_group+0x17/0x20
>>>>>>     [<ffffffff81641352>] system_call_fastpath+0x16/0x1b
>>>>>> Code: 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 90 55 48 89 e5 41 54 53 66 66 66 66 90 65 48 8b 04 25 80 ba 00 00 48 8b 80 50 05 00 00 48 89 fb<4c>    8b 60 28 8b 47 58 85 c0 0f 84 ec 00 00 00 83 e8 01 85 c0 89
>>>>>
>>>>> Aside from the fact that the current net_namespace is not guaranteed to
>>>>> exist when we are called from free_nsproxy, svc_destroy() looks
>>>>> seriously broken:
>>>>
>>>> Trond, looks like you are mistaken here.
>>>> Any process holds references to all namespaces it belong to (copy_net_ns()
>>>> increase usage counter). And network namespace is released after mount namespace
>>>> in free_nsproxy.
>>>
>>> That doesn't help you though. switch_task_namespaces will have already
>>> set current->nsproxy to NULL, which is why we Oops when we try to read
>>> current->nsproxy->net_ns in svc_exit_thread().
>>>
>>>>>
>>>>>          * On the one hand it is trying to free struct svc_serv (and
>>>>>            presumably all structures owned by struct svc_serv).
>>>>>          * On the other hand, it tries to pass a parameter to
>>>>>            svc_close_net() saying "please don't free structures on my
>>>>>            sv_tempsocks, or sv_permsocks list unless they match this net
>>>>>            namespace".
>>>>>
>>>>
>>>> I've sent patches, which moves svc_shutdown_net() from svc_destroy() ("SUNRPC:
>>>> separate per-net data creation from service").
>>>> with this patch set it's assumed, that per-net resources will be created or
>>>> released prior to service creation and destruction.
>>>
>>> Are those patches appropriate for inclusion in the stable kernel series
>>> so that we can fix 3.4?
>>>
>>
>> Yes. But unfortunately, this won't be enough.
>> "NFS: callback threads containerization" patch set is required as well.
>>
>> A a bugfix, I can suggest "SUNRPC: separate per-net data creation from service"
>> patch set + pass hard-coded "init_net" for NFS callback shutdown routines
>> (instead of current->nsproxy->net_ns). This should work.
>
> Hi Stanislav,
>
> My question is why should svc_destroy() care about net namespaces at
> all? Once an application is calling svc_destroy(), it is trying to close
> down the entire service. It really should not matter to which net
> namespace a particular socket belongs: they _all_ need to be destroyed.
>

Hi, Trond.
I have to mention, that from my pow svc_destroy() have to be split into two 
functions: svc_put() and __svc_destroy().
Anyway, previously we had one global counter per service, and we were used to 
destroy service, when the counter reached zero.
Today the situation remain almost the same except we have additional per-net 
counter, which is used for per-net service resources management.
IOW, when service starts, is creates per-net resources in current network 
namespace and increase current per-net and global service counters.
Next service start request will do the same and so on.
It actually means, that:
1) when per-net counter reaches zero, then per-net service resources have to be 
released.
2) when global counter reaches zero, then current user is the last one. And only 
it's resources left.

Something like this...

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 3.4. sunrpc oops during shutdown
  2012-05-28 23:43           ` Myklebust, Trond
  (?)
  (?)
@ 2012-05-29 11:21           ` bfields
  -1 siblings, 0 replies; 16+ messages in thread
From: bfields @ 2012-05-29 11:21 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Stanislav Kinsbursky, Dave Jones, linux-nfs, Linux Kernel

On Mon, May 28, 2012 at 11:43:40PM +0000, Myklebust, Trond wrote:
> On Fri, 2012-05-25 at 17:31 +0400, Stanislav Kinsbursky wrote:
> > Yes. But unfortunately, this won't be enough.
> > "NFS: callback threads containerization" patch set is required as well.
> > 
> > A a bugfix, I can suggest "SUNRPC: separate per-net data creation from service" 
> > patch set + pass hard-coded "init_net" for NFS callback shutdown routines 
> > (instead of current->nsproxy->net_ns). This should work.
> 
> Hi Stanislav,
> 
> My question is why should svc_destroy() care about net namespaces at
> all? Once an application is calling svc_destroy(), it is trying to close
> down the entire service. It really should not matter to which net
> namespace a particular socket belongs: they _all_ need to be destroyed.

Services started in different network namespaces should be
independent--for example, starting nfsd in container A and then again in
container B, then shutting it down in container A, shouldn't also shut
down container B's service.

*But* there is currently only a single global server object, because
we're sharing threads:

	http://marc.info/?l=linux-nfs&m=133405747330055&w=2

	"Having Lockd thread (or NFSd threads) per container looks easy
	to implement on first sight. But kernel threads currently
	supported only in initial pid namespace. I.e. it means that
	per-container kernel thread won't be visible in container, if it
	has it's own pid namespace. And there is no way to put a kernel
	thread into container.  In OpenVZ we have per-container kernel
	threads. But integrating this feature to mainline looks hopeless
	(or very difficult) to me. At least for now.  So this problem
	with signals remains unsolved.

	"So, as it looks to me, this "one service per all" is the only
	one suitable for now."

so Stanislav is simulating multiple servers by shutting down sockets on
a per-net basis.

But I think it should be possible to share threads between servers while
still behaving in every other way as if the servers are completely
independent.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-05-29 11:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-21 17:14 3.4. sunrpc oops during shutdown Dave Jones
2012-05-21 18:03 ` Myklebust, Trond
2012-05-21 18:03   ` Myklebust, Trond
2012-05-21 21:34   ` bfields
2012-05-24 15:55   ` bfields
2012-05-24 19:20     ` Myklebust, Trond
2012-05-24 19:20       ` Myklebust, Trond
2012-05-24 20:27       ` bfields
2012-05-25  8:12   ` Stanislav Kinsbursky
2012-05-25 13:07     ` Myklebust, Trond
2012-05-25 13:07       ` Myklebust, Trond
2012-05-25 13:31       ` Stanislav Kinsbursky
2012-05-28 23:43         ` Myklebust, Trond
2012-05-28 23:43           ` Myklebust, Trond
2012-05-29  8:48           ` Stanislav Kinsbursky
2012-05-29 11:21           ` bfields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.