* rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete @ 2012-06-15 12:54 Harshula 2012-06-15 13:21 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Harshula @ 2012-06-15 12:54 UTC (permalink / raw) To: linux-nfs; +Cc: jlayton Hi All, Can I please get your feedback on the following? rpciod is blocked: ------------------------------------------------------------ crash> bt 2507 PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca ------------------------------------------------------------ nfs_release_page() gives kswapd process an exemption from being blocked. Should we do the same for rpciod. Otherwise rpciod could end up in a deadlock where it can not continue without freeing memory that will only become available when rpciod does its work: ------------------------------------------------------------ 479 /* 480 * Attempt to release the private state associated with a page 481 * - Called if either PG_private or PG_fscache is set on the page 482 * - Caller holds page lock 483 * - Return true (may release page) or false (may not) 484 */ 485 static int nfs_release_page(struct page *page, gfp_t gfp) 486 { 487 struct address_space *mapping = page->mapping; 488 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); 490 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { 493 int how = FLUSH_SYNC; 494 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ 496 if (current_is_kswapd()) 497 how = 0; 498 nfs_commit_inode(mapping->host, how); 499 } 500 /* If PagePrivate() is set, then the page is not freeable */ 501 if (PagePrivate(page)) 502 return 0; 503 return nfs_fscache_release_page(page, gfp); 504 } ------------------------------------------------------------ Another option is to change the priority of the memory allocation: net/ipv4/af_inet.c ------------------------------------------------------------ 265 static int inet_create(struct net *net, struct socket *sock, int protocol, 266 int kern) 267 { ... 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); ------------------------------------------------------------ Considering this is generic net code, I assume the GFP_KERNEL will not be replaced with GFP_ATOMIC. NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses 'legacy' workqueue code. cya, # ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-15 12:54 rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete Harshula @ 2012-06-15 13:21 ` Jeff Layton 2012-06-15 21:25 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-15 13:21 UTC (permalink / raw) To: Harshula; +Cc: linux-nfs On Fri, 15 Jun 2012 22:54:10 +1000 Harshula <harshula@redhat.com> wrote: > Hi All, > > Can I please get your feedback on the following? > > rpciod is blocked: > ------------------------------------------------------------ > crash> bt 2507 > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > ------------------------------------------------------------ > > nfs_release_page() gives kswapd process an exemption from being blocked. > Should we do the same for rpciod. Otherwise rpciod could end up in a > deadlock where it can not continue without freeing memory that will only > become available when rpciod does its work: > ------------------------------------------------------------ > 479 /* > 480 * Attempt to release the private state associated with a page > 481 * - Called if either PG_private or PG_fscache is set on the page > 482 * - Caller holds page lock > 483 * - Return true (may release page) or false (may not) > 484 */ > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > 486 { > 487 struct address_space *mapping = page->mapping; > 488 > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > 490 > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > 493 int how = FLUSH_SYNC; > 494 > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > 496 if (current_is_kswapd()) > 497 how = 0; > 498 nfs_commit_inode(mapping->host, how); > 499 } > 500 /* If PagePrivate() is set, then the page is not freeable */ > 501 if (PagePrivate(page)) > 502 return 0; > 503 return nfs_fscache_release_page(page, gfp); > 504 } > ------------------------------------------------------------ > > Another option is to change the priority of the memory allocation: > net/ipv4/af_inet.c > ------------------------------------------------------------ > 265 static int inet_create(struct net *net, struct socket *sock, int > protocol, > 266 int kern) > 267 { > ... > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > ------------------------------------------------------------ > Considering this is generic net code, I assume the GFP_KERNEL will not > be replaced with GFP_ATOMIC. > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > 'legacy' workqueue code. > > cya, > # > I suspect this is also a problem in mainline, but maybe some of the recent writeback changes prevent it... I think the right solution here is to make nfs_release_page treat rpciod similarly to kswapd. Easier said than done though -- you'll need to come up with a way to determine if you're running in rpciod context... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-15 13:21 ` Jeff Layton @ 2012-06-15 21:25 ` Myklebust, Trond 2012-06-20 14:14 ` Jeff Layton 2012-06-27 15:54 ` Jeff Layton 0 siblings, 2 replies; 23+ messages in thread From: Myklebust, Trond @ 2012-06-15 21:25 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs T24gRnJpLCAyMDEyLTA2LTE1IGF0IDA5OjIxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gRnJpLCAxNSBKdW4gMjAxMiAyMjo1NDoxMCArMTAwMA0KPiBIYXJzaHVsYSA8aGFyc2h1bGFA cmVkaGF0LmNvbT4gd3JvdGU6DQo+IA0KPiA+IEhpIEFsbCwNCj4gPiANCj4gPiBDYW4gSSBwbGVh c2UgZ2V0IHlvdXIgZmVlZGJhY2sgb24gdGhlIGZvbGxvd2luZz8NCj4gPiANCj4gPiBycGNpb2Qg aXMgYmxvY2tlZDoNCj4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiBjcmFzaD4gYnQgIDI1MDcgDQo+ID4gUElEOiAyNTA3 ICAgVEFTSzogZmZmZjg4MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCIN Cj4gPiAgIzAgW2ZmZmY4ODEwMzQzYmYyZjBdIHNjaGVkdWxlIGF0IGZmZmZmZmZmODE0ZGFiZDkN Cj4gPiAgIzEgW2ZmZmY4ODEwMzQzYmYzYjhdIG5mc193YWl0X2JpdF9raWxsYWJsZSBhdCBmZmZm ZmZmZmEwMzhmYzA0IFtuZnNdDQo+ID4gICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRfb25f Yml0IGF0IGZmZmZmZmZmODE0ZGJjMmYNCj4gPiAgIzMgW2ZmZmY4ODEwMzQzYmY0MThdIG91dF9v Zl9saW5lX3dhaXRfb25fYml0IGF0IGZmZmZmZmZmODE0ZGJjZDgNCj4gPiAgIzQgW2ZmZmY4ODEw MzQzYmY0ODhdIG5mc19jb21taXRfaW5vZGUgYXQgZmZmZmZmZmZhMDM5ZTBjMSBbbmZzXQ0KPiA+ ICAjNSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZmZmZmZmEwMzhi ZWY2IFtuZnNdDQo+ID4gICM2IFtmZmZmODgxMDM0M2JmNTI4XSB0cnlfdG9fcmVsZWFzZV9wYWdl IGF0IGZmZmZmZmZmODExMGM2NzANCj4gPiAgIzcgW2ZmZmY4ODEwMzQzYmY1MzhdIHNocmlua19w YWdlX2xpc3QuY2xvbmUuMCBhdCBmZmZmZmZmZjgxMTI2MjcxDQo+ID4gICM4IFtmZmZmODgxMDM0 M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgxMTI2NjM4DQo+ID4gICM5 IFtmZmZmODgxMDM0M2JmODE4XSBzaHJpbmtfem9uZSBhdCBmZmZmZmZmZjgxMTI3ODhmDQo+ID4g IzEwIFtmZmZmODgxMDM0M2JmOGM4XSBkb190cnlfdG9fZnJlZV9wYWdlcyBhdCBmZmZmZmZmZjgx MTI3YjFlDQo+ID4gIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0cnlfdG9fZnJlZV9wYWdlcyBhdCBm ZmZmZmZmZjgxMTI4MTJmDQo+ID4gIzEyIFtmZmZmODgxMDM0M2JmYTA4XSBfX2FsbG9jX3BhZ2Vz X25vZGVtYXNrIGF0IGZmZmZmZmZmODExMWZkYWQNCj4gPiAjMTMgW2ZmZmY4ODEwMzQzYmZiMjhd IGttZW1fZ2V0cGFnZXMgYXQgZmZmZmZmZmY4MTE1OTk0Mg0KPiA+ICMxNCBbZmZmZjg4MTAzNDNi ZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YTU1YQ0KPiA+ICMxNSBbZmZmZjg4 MTAzNDNiZmJkOF0gX19fX2NhY2hlX2FsbG9jX25vZGUgYXQgZmZmZmZmZmY4MTE1YTJkOQ0KPiA+ ICMxNiBbZmZmZjg4MTAzNDNiZmMzOF0ga21lbV9jYWNoZV9hbGxvYyBhdCBmZmZmZmZmZjgxMTVi MDliDQo+ID4gIzE3IFtmZmZmODgxMDM0M2JmYzc4XSBza19wcm90X2FsbG9jIGF0IGZmZmZmZmZm ODE0MTE4MDgNCj4gPiAjMTggW2ZmZmY4ODEwMzQzYmZjYjhdIHNrX2FsbG9jIGF0IGZmZmZmZmZm ODE0MTE5N2MNCj4gPiAjMTkgW2ZmZmY4ODEwMzQzYmZjZThdIGluZXRfY3JlYXRlIGF0IGZmZmZm ZmZmODE0ODNiYTYNCj4gPiAjMjAgW2ZmZmY4ODEwMzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQg ZmZmZmZmZmY4MTQwYjRhNw0KPiA+ICMyMSBbZmZmZjg4MTAzNDNiZmQ5OF0geHNfY3JlYXRlX3Nv Y2sgYXQgZmZmZmZmZmZhMDFmNjQ5YiBbc3VucnBjXQ0KPiA+ICMyMiBbZmZmZjg4MTAzNDNiZmRk OF0geHNfdGNwX3NldHVwX3NvY2tldCBhdCBmZmZmZmZmZmEwMWY2OTY1IFtzdW5ycGNdDQo+ID4g IzIzIFtmZmZmODgxMDM0M2JmZTM4XSB3b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDAN Cj4gPiAjMjQgW2ZmZmY4ODEwMzQzYmZlZThdIGt0aHJlYWQgYXQgZmZmZmZmZmY4MTA4ZGQ5Ng0K PiA+ICMyNSBbZmZmZjg4MTAzNDNiZmY0OF0ga2VybmVsX3RocmVhZCBhdCBmZmZmZmZmZjgxMDBj MWNhDQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tDQo+ID4gDQo+ID4gbmZzX3JlbGVhc2VfcGFnZSgpIGdpdmVzIGtzd2FwZCBw cm9jZXNzIGFuIGV4ZW1wdGlvbiBmcm9tIGJlaW5nIGJsb2NrZWQuDQo+ID4gU2hvdWxkIHdlIGRv IHRoZSBzYW1lIGZvciBycGNpb2QuIE90aGVyd2lzZSBycGNpb2QgY291bGQgZW5kIHVwIGluIGEN Cj4gPiBkZWFkbG9jayB3aGVyZSBpdCBjYW4gbm90IGNvbnRpbnVlIHdpdGhvdXQgZnJlZWluZyBt ZW1vcnkgdGhhdCB3aWxsIG9ubHkNCj4gPiBiZWNvbWUgYXZhaWxhYmxlIHdoZW4gcnBjaW9kIGRv ZXMgaXRzIHdvcms6DQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gNDc5IC8qDQo+ID4gNDgwICAqIEF0dGVtcHQgdG8g cmVsZWFzZSB0aGUgcHJpdmF0ZSBzdGF0ZSBhc3NvY2lhdGVkIHdpdGggYSBwYWdlDQo+ID4gNDgx ICAqIC0gQ2FsbGVkIGlmIGVpdGhlciBQR19wcml2YXRlIG9yIFBHX2ZzY2FjaGUgaXMgc2V0IG9u IHRoZSBwYWdlDQo+ID4gNDgyICAqIC0gQ2FsbGVyIGhvbGRzIHBhZ2UgbG9jaw0KPiA+IDQ4MyAg KiAtIFJldHVybiB0cnVlIChtYXkgcmVsZWFzZSBwYWdlKSBvciBmYWxzZSAobWF5IG5vdCkNCj4g PiA0ODQgICovDQo+ID4gNDg1IHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShzdHJ1Y3QgcGFn ZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiA+IDQ4NiB7ICAgDQo+ID4gNDg3ICAgICAgICAgc3RydWN0 IGFkZHJlc3Nfc3BhY2UgKm1hcHBpbmcgPSBwYWdlLT5tYXBwaW5nOw0KPiA+IDQ4OCAgICAgDQo+ ID4gNDg5ICAgICAgICAgZGZwcmludGsoUEFHRUNBQ0hFLCAiTkZTOiByZWxlYXNlX3BhZ2UoJXAp XG4iLCBwYWdlKTsNCj4gPiA0OTAgICAgIA0KPiA+IDQ5MSAgICAgICAgIC8qIE9ubHkgZG8gSS9P IGlmIGdmcCBpcyBhIHN1cGVyc2V0IG9mIEdGUF9LRVJORUwgKi8NCj4gPiA0OTIgICAgICAgICBp ZiAobWFwcGluZyAmJiAoZ2ZwICYgR0ZQX0tFUk5FTCkgPT0gR0ZQX0tFUk5FTCkgew0KPiA+IDQ5 MyAgICAgICAgICAgICAgICAgaW50IGhvdyA9IEZMVVNIX1NZTkM7DQo+ID4gNDk0ICAgICAgICAg ICAgIA0KPiA+IDQ5NSAgICAgICAgICAgICAgICAgLyogRG9uJ3QgbGV0IGtzd2FwZCBkZWFkbG9j ayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzICovDQo+ID4gNDk2ICAgICAgICAgICAgICAgICBp ZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gPiA0OTcgICAgICAgICAgICAgICAgICAgICAgICAg aG93ID0gMDsNCj4gPiA0OTggICAgICAgICAgICAgICAgIG5mc19jb21taXRfaW5vZGUobWFwcGlu Zy0+aG9zdCwgaG93KTsNCj4gPiA0OTkgICAgICAgICB9DQo+ID4gNTAwICAgICAgICAgLyogSWYg UGFnZVByaXZhdGUoKSBpcyBzZXQsIHRoZW4gdGhlIHBhZ2UgaXMgbm90IGZyZWVhYmxlICovDQo+ ID4gNTAxICAgICAgICAgaWYgKFBhZ2VQcml2YXRlKHBhZ2UpKQ0KPiA+IDUwMiAgICAgICAgICAg ICAgICAgcmV0dXJuIDA7DQo+ID4gNTAzICAgICAgICAgcmV0dXJuIG5mc19mc2NhY2hlX3JlbGVh c2VfcGFnZShwYWdlLCBnZnApOw0KPiA+IDUwNCB9DQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gDQo+ID4gQW5vdGhl ciBvcHRpb24gaXMgdG8gY2hhbmdlIHRoZSBwcmlvcml0eSBvZiB0aGUgbWVtb3J5IGFsbG9jYXRp b246DQo+ID4gbmV0L2lwdjQvYWZfaW5ldC5jDQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gIDI2NSBzdGF0aWMgaW50 IGluZXRfY3JlYXRlKHN0cnVjdCBuZXQgKm5ldCwgc3RydWN0IHNvY2tldCAqc29jaywgaW50DQo+ ID4gcHJvdG9jb2wsDQo+ID4gIDI2NiAgICAgICAgICAgICAgICAgICAgICAgIGludCBrZXJuKQ0K PiA+ICAyNjcgew0KPiA+IC4uLg0KPiA+ICAzNDUgICAgICAgICBzayA9IHNrX2FsbG9jKG5ldCwg UEZfSU5FVCwgR0ZQX0tFUk5FTCwgYW5zd2VyX3Byb3QpOw0KPiA+IC0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPiA+IENvbnNpZGVy aW5nIHRoaXMgaXMgZ2VuZXJpYyBuZXQgY29kZSwgSSBhc3N1bWUgdGhlIEdGUF9LRVJORUwgd2ls bCBub3QNCj4gPiBiZSByZXBsYWNlZCB3aXRoIEdGUF9BVE9NSUMuDQo+ID4gDQo+ID4gTk9URSwg dGhpcyBpcyBvbiBSSEVMIDYuMSBrZXJuZWwgMi42LjMyLTEzMS42LjEgYW5kIGFwcGFyZW50bHkg dXNlcw0KPiA+ICdsZWdhY3knIHdvcmtxdWV1ZSBjb2RlLg0KPiA+IA0KPiA+IGN5YSwNCj4gPiAj DQo+ID4gDQo+IA0KPiBJIHN1c3BlY3QgdGhpcyBpcyBhbHNvIGEgcHJvYmxlbSBpbiBtYWlubGlu ZSwgYnV0IG1heWJlIHNvbWUgb2YgdGhlDQo+IHJlY2VudCB3cml0ZWJhY2sgY2hhbmdlcyBwcmV2 ZW50IGl0Li4uDQo+IA0KPiBJIHRoaW5rIHRoZSByaWdodCBzb2x1dGlvbiBoZXJlIGlzIHRvIG1h a2UgbmZzX3JlbGVhc2VfcGFnZSB0cmVhdCBycGNpb2QNCj4gc2ltaWxhcmx5IHRvIGtzd2FwZC4g RWFzaWVyIHNhaWQgdGhhbiBkb25lIHRob3VnaCAtLSB5b3UnbGwgbmVlZCB0bw0KPiBjb21lIHVw IHdpdGggYSB3YXkgdG8gZGV0ZXJtaW5lIGlmIHlvdSdyZSBydW5uaW5nIGluIHJwY2lvZCBjb250 ZXh0Li4uDQoNCk5vLiBUaGUgX3JpZ2h0XyBzb2x1dGlvbiBpcyB0byBlbnN1cmUgdGhhdCBycGNp b2QgZG9lc24ndCBkbyBhbGxvY2F0aW9ucw0KdGhhdCByZXN1bHQgaW4gYSBwYWdlIHJlY2xhaW0u Li4gdHJ5X3RvX3JlbGVhc2VfcGFnZSgpIGlzIGp1c3QgdGhlIHRpcA0Kb2YgdGhlIGljZWJlcmcg b2YgY3JhenkgZGVhZGxvY2tzIHRoYXQgdGhpcyBzb2NrZXQgYWxsb2NhdGlvbiBjYW4gZ2V0IHVz DQppbnRvLg0KDQpVbmZvcnR1bmF0ZWx5LCBzZWxpbnV4ICYgY28uIHByZXZlbnQgdXMgZnJvbSBh bGxvY2F0aW5nIHRoZSBzb2NrZXRzIGluDQp1c2VyIGNvbnRleHRzLCBhbmQgYW55d2F5LCBoYXZp bmcgdG8gd2FpdCBmb3IgYW5vdGhlciB0aHJlYWQgdG8gZG8gdGhlDQpzYW1lIGFsbG9jYXRpb24g aXNuJ3QgZG9pbmcgdG8gaGVscCBwcmV2ZW50IHRoZSBkZWFkbG9jay4uLg0KDQpJIGtub3cgdGhh dCBNZWwgR29ybWFuJ3MgTkZTIHN3YXAgcGF0Y2hlcyBoYWQgc29tZSBwcm90ZWN0aW9ucyBhZ2Fp bnN0DQp0aGlzIHNvcnQgb2YgZGVhZGxvY2suIFBlcmhhcHMgd2UgY2FuIGxvb2sgYXQgaG93IGhl IHdhcyBkb2luZyB0aGlzPw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVu dCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5u ZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-15 21:25 ` Myklebust, Trond @ 2012-06-20 14:14 ` Jeff Layton 2012-06-28 14:31 ` Mel Gorman 2012-06-27 15:54 ` Jeff Layton 1 sibling, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-20 14:14 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman On Fri, 15 Jun 2012 21:25:10 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote: > > On Fri, 15 Jun 2012 22:54:10 +1000 > > Harshula <harshula@redhat.com> wrote: > > > > > Hi All, > > > > > > Can I please get your feedback on the following? > > > > > > rpciod is blocked: > > > ------------------------------------------------------------ > > > crash> bt 2507 > > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > > > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > > > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > > > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > > > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > > > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > > > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > > > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > > > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > > > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > > > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > > ------------------------------------------------------------ > > > > > > nfs_release_page() gives kswapd process an exemption from being blocked. > > > Should we do the same for rpciod. Otherwise rpciod could end up in a > > > deadlock where it can not continue without freeing memory that will only > > > become available when rpciod does its work: > > > ------------------------------------------------------------ > > > 479 /* > > > 480 * Attempt to release the private state associated with a page > > > 481 * - Called if either PG_private or PG_fscache is set on the page > > > 482 * - Caller holds page lock > > > 483 * - Return true (may release page) or false (may not) > > > 484 */ > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > > > 486 { > > > 487 struct address_space *mapping = page->mapping; > > > 488 > > > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > > 490 > > > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > > > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > > > 493 int how = FLUSH_SYNC; > > > 494 > > > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > > 496 if (current_is_kswapd()) > > > 497 how = 0; > > > 498 nfs_commit_inode(mapping->host, how); > > > 499 } > > > 500 /* If PagePrivate() is set, then the page is not freeable */ > > > 501 if (PagePrivate(page)) > > > 502 return 0; > > > 503 return nfs_fscache_release_page(page, gfp); > > > 504 } > > > ------------------------------------------------------------ > > > > > > Another option is to change the priority of the memory allocation: > > > net/ipv4/af_inet.c > > > ------------------------------------------------------------ > > > 265 static int inet_create(struct net *net, struct socket *sock, int > > > protocol, > > > 266 int kern) > > > 267 { > > > ... > > > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > > > ------------------------------------------------------------ > > > Considering this is generic net code, I assume the GFP_KERNEL will not > > > be replaced with GFP_ATOMIC. > > > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > > > 'legacy' workqueue code. > > > > > > cya, > > > # > > > > > > > I suspect this is also a problem in mainline, but maybe some of the > > recent writeback changes prevent it... > > > > I think the right solution here is to make nfs_release_page treat rpciod > > similarly to kswapd. Easier said than done though -- you'll need to > > come up with a way to determine if you're running in rpciod context... > > No. The _right_ solution is to ensure that rpciod doesn't do allocations > that result in a page reclaim... try_to_release_page() is just the tip > of the iceberg of crazy deadlocks that this socket allocation can get us > into. > > Unfortunately, selinux & co. prevent us from allocating the sockets in > user contexts, and anyway, having to wait for another thread to do the > same allocation isn't doing to help prevent the deadlock... > My thinking was that we could return without releasing the page, and eventually kmalloc would get to one that was able to be reclaimed via other means. That's probably too optimistic though... > I know that Mel Gorman's NFS swap patches had some protections against > this sort of deadlock. Perhaps we can look at how he was doing this? > (cc'ing Mel in case he has thoughts) I looked over Mel's patches and they mainly seem to affect how memory for the socket is handled after it has been created. In this case, we're hanging on the allocation of a new socket altogether. It does do this before calling xs_create_sock however: + if (xprt->swapper) + current->flags |= PF_MEMALLOC; + Perhaps we should just set PF_MEMALLOC unconditionally there? We might still need Mel's other patches that add SOCK_MEMALLOC to ensure that it can actually be connected, but that's a bit of a different problem. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-20 14:14 ` Jeff Layton @ 2012-06-28 14:31 ` Mel Gorman 2012-06-28 14:55 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Mel Gorman @ 2012-06-28 14:31 UTC (permalink / raw) To: Jeff Layton; +Cc: Myklebust, Trond, Harshula, linux-nfs I am not on the linux-nfs list so I lack context about this issue so take all this with a grain of salt. On Wed, Jun 20, 2012 at 07:14:22AM -0700, Jeff Layton wrote: > On Fri, 15 Jun 2012 21:25:10 +0000 > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote: > > > On Fri, 15 Jun 2012 22:54:10 +1000 > > > Harshula <harshula@redhat.com> wrote: > > > > > > > Hi All, > > > > > > > > Can I please get your feedback on the following? > > > > > > > > rpciod is blocked: > > > > ------------------------------------------------------------ > > > > crash> bt 2507 > > > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > > > > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > > > > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > > > > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > > > > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > > > > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > > > > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > > > > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > > > > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > > > > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > > > > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > > > ------------------------------------------------------------ > > > > > > > > nfs_release_page() gives kswapd process an exemption from being blocked. > > > > Should we do the same for rpciod. Otherwise rpciod could end up in a > > > > deadlock where it can not continue without freeing memory that will only > > > > become available when rpciod does its work: Typically processes don't get an exception as such. Filesystems and callers involved in IO are meant to use GFP_NOFS and GFP_NOIO as appropriate to stop reclaim causing them to block on themselves. > > > > ------------------------------------------------------------ > > > > 479 /* > > > > 480 * Attempt to release the private state associated with a page > > > > 481 * - Called if either PG_private or PG_fscache is set on the page > > > > 482 * - Caller holds page lock > > > > 483 * - Return true (may release page) or false (may not) > > > > 484 */ > > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > > > > 486 { > > > > 487 struct address_space *mapping = page->mapping; > > > > 488 > > > > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > > > 490 > > > > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > > > > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > > > > 493 int how = FLUSH_SYNC; > > > > 494 > > > > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > > > 496 if (current_is_kswapd()) > > > > 497 how = 0; > > > > 498 nfs_commit_inode(mapping->host, how); > > > > 499 } > > > > 500 /* If PagePrivate() is set, then the page is not freeable */ > > > > 501 if (PagePrivate(page)) > > > > 502 return 0; > > > > 503 return nfs_fscache_release_page(page, gfp); > > > > 504 } > > > > ------------------------------------------------------------ > > > > > > > > Another option is to change the priority of the memory allocation: > > > > net/ipv4/af_inet.c > > > > ------------------------------------------------------------ > > > > 265 static int inet_create(struct net *net, struct socket *sock, int > > > > protocol, > > > > 266 int kern) > > > > 267 { > > > > ... > > > > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > > > > ------------------------------------------------------------ > > > > Considering this is generic net code, I assume the GFP_KERNEL will not > > > > be replaced with GFP_ATOMIC. > > > > > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > > > > 'legacy' workqueue code. > > > > > > > > cya, > > > > # > > > > > > > > > > I suspect this is also a problem in mainline, but maybe some of the > > > recent writeback changes prevent it... > > > > > > I think the right solution here is to make nfs_release_page treat rpciod > > > similarly to kswapd. Easier said than done though -- you'll need to > > > come up with a way to determine if you're running in rpciod context... > > > > No. The _right_ solution is to ensure that rpciod doesn't do allocations > > that result in a page reclaim... try_to_release_page() is just the tip > > of the iceberg of crazy deadlocks that this socket allocation can get us > > into. > > Fully agreed. Callers that cannot dispatch IO or do filesystem based operations should not specify __GFP_FS or __GFP_IO. > > Unfortunately, selinux & co. prevent us from allocating the sockets in > > user contexts, and anyway, having to wait for another thread to do the > > same allocation isn't doing to help prevent the deadlock... > > > > My thinking was that we could return without releasing the page, and > eventually kmalloc would get to one that was able to be reclaimed via > other means. That's probably too optimistic though... > This is what is meant to happen if you use GFP flags appropriately. The pages that cannot be reclaimed are skipped over until a clean page can be discarded for example. In some cases it might still be possible to swap out a page. > > I know that Mel Gorman's NFS swap patches had some protections against > > this sort of deadlock. Perhaps we can look at how he was doing this? > > > > (cc'ing Mel in case he has thoughts) > This is not what those patches are for. They are specific to the case where the network traffic is related to swapping. It allows temporary overriding of the watermarks to receive and send packets related to swap and discards all others. It is not a general solution for the sort of problem you are seeing. > I looked over Mel's patches and they mainly seem to affect how memory > for the socket is handled after it has been created. In this case, we're > hanging on the allocation of a new socket altogether. > > It does do this before calling xs_create_sock however: > > + if (xprt->swapper) > + current->flags |= PF_MEMALLOC; > + > > Perhaps we should just set PF_MEMALLOC unconditionally there? No, that will cause the system to deadlock under very heavy memory pressure. It's not always easy to trigger such a situation but it is always the risk with PF_MEMALLOC is abused. > We might > still need Mel's other patches that add SOCK_MEMALLOC to ensure that it > can actually be connected, but that's a bit of a different problem. > Do not do this. The SOCK_MEMALLOC stuff is specific to swapping. It is not a substitute to avoid using the correct flags. Abusing PF_MEMALLOC leads to deadlock. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-28 14:31 ` Mel Gorman @ 2012-06-28 14:55 ` Jeff Layton 0 siblings, 0 replies; 23+ messages in thread From: Jeff Layton @ 2012-06-28 14:55 UTC (permalink / raw) To: Mel Gorman; +Cc: Myklebust, Trond, Harshula, linux-nfs On Thu, 28 Jun 2012 15:31:57 +0100 Mel Gorman <mgorman@suse.de> wrote: > I am not on the linux-nfs list so I lack context about this issue so > take all this with a grain of salt. > > On Wed, Jun 20, 2012 at 07:14:22AM -0700, Jeff Layton wrote: > > On Fri, 15 Jun 2012 21:25:10 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote: > > > > On Fri, 15 Jun 2012 22:54:10 +1000 > > > > Harshula <harshula@redhat.com> wrote: > > > > > > > > > Hi All, > > > > > > > > > > Can I please get your feedback on the following? > > > > > > > > > > rpciod is blocked: > > > > > ------------------------------------------------------------ > > > > > crash> bt 2507 > > > > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > > > > > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > > > > > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > > > > > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > > > > > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > > > > > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > > > > > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > > > > > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > > > > > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > > > > > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > > > > > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > > > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > > > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > > > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > > > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > > > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > > > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > > > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > > > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > > > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > > > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > > > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > > > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > > > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > > > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > > > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > > > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > > > > ------------------------------------------------------------ > > > > > > > > > > nfs_release_page() gives kswapd process an exemption from being blocked. > > > > > Should we do the same for rpciod. Otherwise rpciod could end up in a > > > > > deadlock where it can not continue without freeing memory that will only > > > > > become available when rpciod does its work: > > Typically processes don't get an exception as such. Filesystems and > callers involved in IO are meant to use GFP_NOFS and GFP_NOIO as > appropriate to stop reclaim causing them to block on themselves. > > > > > > ------------------------------------------------------------ > > > > > 479 /* > > > > > 480 * Attempt to release the private state associated with a page > > > > > 481 * - Called if either PG_private or PG_fscache is set on the page > > > > > 482 * - Caller holds page lock > > > > > 483 * - Return true (may release page) or false (may not) > > > > > 484 */ > > > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > > > > > 486 { > > > > > 487 struct address_space *mapping = page->mapping; > > > > > 488 > > > > > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > > > > 490 > > > > > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > > > > > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > > > > > 493 int how = FLUSH_SYNC; > > > > > 494 > > > > > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > > > > 496 if (current_is_kswapd()) > > > > > 497 how = 0; > > > > > 498 nfs_commit_inode(mapping->host, how); > > > > > 499 } > > > > > 500 /* If PagePrivate() is set, then the page is not freeable */ > > > > > 501 if (PagePrivate(page)) > > > > > 502 return 0; > > > > > 503 return nfs_fscache_release_page(page, gfp); > > > > > 504 } > > > > > ------------------------------------------------------------ > > > > > > > > > > Another option is to change the priority of the memory allocation: > > > > > net/ipv4/af_inet.c > > > > > ------------------------------------------------------------ > > > > > 265 static int inet_create(struct net *net, struct socket *sock, int > > > > > protocol, > > > > > 266 int kern) > > > > > 267 { > > > > > ... > > > > > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > > > > > ------------------------------------------------------------ > > > > > Considering this is generic net code, I assume the GFP_KERNEL will not > > > > > be replaced with GFP_ATOMIC. > > > > > > > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > > > > > 'legacy' workqueue code. > > > > > > > > > > cya, > > > > > # > > > > > > > > > > > > > I suspect this is also a problem in mainline, but maybe some of the > > > > recent writeback changes prevent it... > > > > > > > > I think the right solution here is to make nfs_release_page treat rpciod > > > > similarly to kswapd. Easier said than done though -- you'll need to > > > > come up with a way to determine if you're running in rpciod context... > > > > > > No. The _right_ solution is to ensure that rpciod doesn't do allocations > > > that result in a page reclaim... try_to_release_page() is just the tip > > > of the iceberg of crazy deadlocks that this socket allocation can get us > > > into. > > > > > Fully agreed. Callers that cannot dispatch IO or do filesystem based > operations should not specify __GFP_FS or __GFP_IO. > > > > Unfortunately, selinux & co. prevent us from allocating the sockets in > > > user contexts, and anyway, having to wait for another thread to do the > > > same allocation isn't doing to help prevent the deadlock... > > > > > > > My thinking was that we could return without releasing the page, and > > eventually kmalloc would get to one that was able to be reclaimed via > > other means. That's probably too optimistic though... > > > > This is what is meant to happen if you use GFP flags appropriately. The > pages that cannot be reclaimed are skipped over until a clean page can > be discarded for example. In some cases it might still be possible to > swap out a page. > > > > I know that Mel Gorman's NFS swap patches had some protections against > > > this sort of deadlock. Perhaps we can look at how he was doing this? > > > > > > > (cc'ing Mel in case he has thoughts) > > > > This is not what those patches are for. They are specific to the case > where the network traffic is related to swapping. It allows temporary > overriding of the watermarks to receive and send packets related to swap > and discards all others. It is not a general solution for the sort of > problem you are seeing. > > > I looked over Mel's patches and they mainly seem to affect how memory > > for the socket is handled after it has been created. In this case, we're > > hanging on the allocation of a new socket altogether. > > > > It does do this before calling xs_create_sock however: > > > > + if (xprt->swapper) > > + current->flags |= PF_MEMALLOC; > > + > > > > Perhaps we should just set PF_MEMALLOC unconditionally there? > > No, that will cause the system to deadlock under very heavy memory > pressure. It's not always easy to trigger such a situation but it is always > the risk with PF_MEMALLOC is abused. > > > We might > > still need Mel's other patches that add SOCK_MEMALLOC to ensure that it > > can actually be connected, but that's a bit of a different problem. > > > > Do not do this. The SOCK_MEMALLOC stuff is specific to swapping. It is > not a substitute to avoid using the correct flags. Abusing PF_MEMALLOC > leads to deadlock. > Ok, thanks for having a look. Sorry to add you in the middle of the thread, that does make it a bit difficult to understand what's happening. It would be nice if we could pass GFP flags appropriately here, but the problem is that the GFP flags for the socket allocation are buried way down deep inside of inet_create(). See the sk_alloc() call in there. Now, we could fix it up so that __sock_create passes a set of GFP flags all the way down to that point, but even that won't really save us here in all cases. There's a sock_alloc() call in there which will need an inode. __sock_create also does a request_module() and that can certainly require memory too. Good luck getting that to take GFP flags. The right approach I think is to do what Trond suggested an an earlier mail and leverage the PF_FSTRANS flag... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-15 21:25 ` Myklebust, Trond 2012-06-20 14:14 ` Jeff Layton @ 2012-06-27 15:54 ` Jeff Layton 2012-06-27 18:43 ` Myklebust, Trond 1 sibling, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-27 15:54 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs On Fri, 15 Jun 2012 21:25:10 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote: > > On Fri, 15 Jun 2012 22:54:10 +1000 > > Harshula <harshula@redhat.com> wrote: > > > > > Hi All, > > > > > > Can I please get your feedback on the following? > > > > > > rpciod is blocked: > > > ------------------------------------------------------------ > > > crash> bt 2507 > > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > > > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > > > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > > > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > > > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > > > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > > > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > > > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > > > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > > > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > > > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > > ------------------------------------------------------------ > > > > > > nfs_release_page() gives kswapd process an exemption from being blocked. > > > Should we do the same for rpciod. Otherwise rpciod could end up in a > > > deadlock where it can not continue without freeing memory that will only > > > become available when rpciod does its work: > > > ------------------------------------------------------------ > > > 479 /* > > > 480 * Attempt to release the private state associated with a page > > > 481 * - Called if either PG_private or PG_fscache is set on the page > > > 482 * - Caller holds page lock > > > 483 * - Return true (may release page) or false (may not) > > > 484 */ > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > > > 486 { > > > 487 struct address_space *mapping = page->mapping; > > > 488 > > > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > > 490 > > > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > > > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > > > 493 int how = FLUSH_SYNC; > > > 494 > > > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > > 496 if (current_is_kswapd()) > > > 497 how = 0; > > > 498 nfs_commit_inode(mapping->host, how); > > > 499 } > > > 500 /* If PagePrivate() is set, then the page is not freeable */ > > > 501 if (PagePrivate(page)) > > > 502 return 0; > > > 503 return nfs_fscache_release_page(page, gfp); > > > 504 } > > > ------------------------------------------------------------ > > > > > > Another option is to change the priority of the memory allocation: > > > net/ipv4/af_inet.c > > > ------------------------------------------------------------ > > > 265 static int inet_create(struct net *net, struct socket *sock, int > > > protocol, > > > 266 int kern) > > > 267 { > > > ... > > > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > > > ------------------------------------------------------------ > > > Considering this is generic net code, I assume the GFP_KERNEL will not > > > be replaced with GFP_ATOMIC. > > > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > > > 'legacy' workqueue code. > > > > > > cya, > > > # > > > > > > > I suspect this is also a problem in mainline, but maybe some of the > > recent writeback changes prevent it... > > > > I think the right solution here is to make nfs_release_page treat rpciod > > similarly to kswapd. Easier said than done though -- you'll need to > > come up with a way to determine if you're running in rpciod context... > > No. The _right_ solution is to ensure that rpciod doesn't do allocations > that result in a page reclaim... try_to_release_page() is just the tip > of the iceberg of crazy deadlocks that this socket allocation can get us > into. > > Unfortunately, selinux & co. prevent us from allocating the sockets in > user contexts, and anyway, having to wait for another thread to do the > same allocation isn't doing to help prevent the deadlock... > > I know that Mel Gorman's NFS swap patches had some protections against > this sort of deadlock. Perhaps we can look at how he was doing this? > Actually...now that I've looked at this, I think the right solution here is to stop calling sock_release() on these sockets until we're in xs_destroy. IOW, don't free the socket altogether unless we're truly tearing down the connection for good. That means though that we need to convert the places that currently call rpc_xprt_ops->close to call something like a (new) ops->shutdown routine instead, with the assumption of course that we can somehow reset and reuse the socket afterward. The question is how best to emulate the effect of ->close without actually freeing the socket. Is it sufficient to call something like kernel_sock_shutdown() on it, and twiddle all of the bits to ensure that the connect_worker will get called afterward? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 15:54 ` Jeff Layton @ 2012-06-27 18:43 ` Myklebust, Trond 2012-06-27 19:28 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-06-27 18:43 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs T24gV2VkLCAyMDEyLTA2LTI3IGF0IDExOjU0IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gRnJpLCAxNSBKdW4gMjAxMiAyMToyNToxMCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gRnJpLCAyMDEy LTA2LTE1IGF0IDA5OjIxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIEZyaSwg MTUgSnVuIDIwMTIgMjI6NTQ6MTAgKzEwMDANCj4gPiA+IEhhcnNodWxhIDxoYXJzaHVsYUByZWRo YXQuY29tPiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gPiBIaSBBbGwsDQo+ID4gPiA+IA0KPiA+ID4g PiBDYW4gSSBwbGVhc2UgZ2V0IHlvdXIgZmVlZGJhY2sgb24gdGhlIGZvbGxvd2luZz8NCj4gPiA+ ID4gDQo+ID4gPiA+IHJwY2lvZCBpcyBibG9ja2VkOg0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gY3Jh c2g+IGJ0ICAyNTA3IA0KPiA+ID4gPiBQSUQ6IDI1MDcgICBUQVNLOiBmZmZmODgxMDM2OTFhYjQw ICBDUFU6IDE0ICBDT01NQU5EOiAicnBjaW9kLzE0Ig0KPiA+ID4gPiAgIzAgW2ZmZmY4ODEwMzQz YmYyZjBdIHNjaGVkdWxlIGF0IGZmZmZmZmZmODE0ZGFiZDkNCj4gPiA+ID4gICMxIFtmZmZmODgx MDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZmZmZmZmZhMDM4ZmMwNCBbbmZz XQ0KPiA+ID4gPiAgIzIgW2ZmZmY4ODEwMzQzYmYzYzhdIF9fd2FpdF9vbl9iaXQgYXQgZmZmZmZm ZmY4MTRkYmMyZg0KPiA+ID4gPiAgIzMgW2ZmZmY4ODEwMzQzYmY0MThdIG91dF9vZl9saW5lX3dh aXRfb25fYml0IGF0IGZmZmZmZmZmODE0ZGJjZDgNCj4gPiA+ID4gICM0IFtmZmZmODgxMDM0M2Jm NDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25mc10NCj4gPiA+ID4g ICM1IFtmZmZmODgxMDM0M2JmNGY4XSBuZnNfcmVsZWFzZV9wYWdlIGF0IGZmZmZmZmZmYTAzOGJl ZjYgW25mc10NCj4gPiA+ID4gICM2IFtmZmZmODgxMDM0M2JmNTI4XSB0cnlfdG9fcmVsZWFzZV9w YWdlIGF0IGZmZmZmZmZmODExMGM2NzANCj4gPiA+ID4gICM3IFtmZmZmODgxMDM0M2JmNTM4XSBz aHJpbmtfcGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiA+ID4gPiAgIzgg W2ZmZmY4ODEwMzQzYmY2NjhdIHNocmlua19pbmFjdGl2ZV9saXN0IGF0IGZmZmZmZmZmODExMjY2 MzgNCj4gPiA+ID4gICM5IFtmZmZmODgxMDM0M2JmODE4XSBzaHJpbmtfem9uZSBhdCBmZmZmZmZm ZjgxMTI3ODhmDQo+ID4gPiA+ICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVf cGFnZXMgYXQgZmZmZmZmZmY4MTEyN2IxZQ0KPiA+ID4gPiAjMTEgW2ZmZmY4ODEwMzQzYmY5NThd IHRyeV90b19mcmVlX3BhZ2VzIGF0IGZmZmZmZmZmODExMjgxMmYNCj4gPiA+ID4gIzEyIFtmZmZm ODgxMDM0M2JmYTA4XSBfX2FsbG9jX3BhZ2VzX25vZGVtYXNrIGF0IGZmZmZmZmZmODExMWZkYWQN Cj4gPiA+ID4gIzEzIFtmZmZmODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZm ODExNTk5NDINCj4gPiA+ID4gIzE0IFtmZmZmODgxMDM0M2JmYjU4XSBmYWxsYmFja19hbGxvYyBh dCBmZmZmZmZmZjgxMTVhNTVhDQo+ID4gPiA+ICMxNSBbZmZmZjg4MTAzNDNiZmJkOF0gX19fX2Nh Y2hlX2FsbG9jX25vZGUgYXQgZmZmZmZmZmY4MTE1YTJkOQ0KPiA+ID4gPiAjMTYgW2ZmZmY4ODEw MzQzYmZjMzhdIGttZW1fY2FjaGVfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiA+ID4gPiAj MTcgW2ZmZmY4ODEwMzQzYmZjNzhdIHNrX3Byb3RfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTgwOA0K PiA+ID4gPiAjMTggW2ZmZmY4ODEwMzQzYmZjYjhdIHNrX2FsbG9jIGF0IGZmZmZmZmZmODE0MTE5 N2MNCj4gPiA+ID4gIzE5IFtmZmZmODgxMDM0M2JmY2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZm ZjgxNDgzYmE2DQo+ID4gPiA+ICMyMCBbZmZmZjg4MTAzNDNiZmQzOF0gX19zb2NrX2NyZWF0ZSBh dCBmZmZmZmZmZjgxNDBiNGE3DQo+ID4gPiA+ICMyMSBbZmZmZjg4MTAzNDNiZmQ5OF0geHNfY3Jl YXRlX3NvY2sgYXQgZmZmZmZmZmZhMDFmNjQ5YiBbc3VucnBjXQ0KPiA+ID4gPiAjMjIgW2ZmZmY4 ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1cF9zb2NrZXQgYXQgZmZmZmZmZmZhMDFmNjk2NSBbc3Vu cnBjXQ0KPiA+ID4gPiAjMjMgW2ZmZmY4ODEwMzQzYmZlMzhdIHdvcmtlcl90aHJlYWQgYXQgZmZm ZmZmZmY4MTA4ODdkMA0KPiA+ID4gPiAjMjQgW2ZmZmY4ODEwMzQzYmZlZThdIGt0aHJlYWQgYXQg ZmZmZmZmZmY4MTA4ZGQ5Ng0KPiA+ID4gPiAjMjUgW2ZmZmY4ODEwMzQzYmZmNDhdIGtlcm5lbF90 aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gDQo+ID4gPiA+ IG5mc19yZWxlYXNlX3BhZ2UoKSBnaXZlcyBrc3dhcGQgcHJvY2VzcyBhbiBleGVtcHRpb24gZnJv bSBiZWluZyBibG9ja2VkLg0KPiA+ID4gPiBTaG91bGQgd2UgZG8gdGhlIHNhbWUgZm9yIHJwY2lv ZC4gT3RoZXJ3aXNlIHJwY2lvZCBjb3VsZCBlbmQgdXAgaW4gYQ0KPiA+ID4gPiBkZWFkbG9jayB3 aGVyZSBpdCBjYW4gbm90IGNvbnRpbnVlIHdpdGhvdXQgZnJlZWluZyBtZW1vcnkgdGhhdCB3aWxs IG9ubHkNCj4gPiA+ID4gYmVjb21lIGF2YWlsYWJsZSB3aGVuIHJwY2lvZCBkb2VzIGl0cyB3b3Jr Og0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gNDc5IC8qDQo+ID4gPiA+IDQ4MCAgKiBBdHRlbXB0IHRv IHJlbGVhc2UgdGhlIHByaXZhdGUgc3RhdGUgYXNzb2NpYXRlZCB3aXRoIGEgcGFnZQ0KPiA+ID4g PiA0ODEgICogLSBDYWxsZWQgaWYgZWl0aGVyIFBHX3ByaXZhdGUgb3IgUEdfZnNjYWNoZSBpcyBz ZXQgb24gdGhlIHBhZ2UNCj4gPiA+ID4gNDgyICAqIC0gQ2FsbGVyIGhvbGRzIHBhZ2UgbG9jaw0K PiA+ID4gPiA0ODMgICogLSBSZXR1cm4gdHJ1ZSAobWF5IHJlbGVhc2UgcGFnZSkgb3IgZmFsc2Ug KG1heSBub3QpDQo+ID4gPiA+IDQ4NCAgKi8NCj4gPiA+ID4gNDg1IHN0YXRpYyBpbnQgbmZzX3Jl bGVhc2VfcGFnZShzdHJ1Y3QgcGFnZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiA+ID4gPiA0ODYgeyAg IA0KPiA+ID4gPiA0ODcgICAgICAgICBzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGluZyA9IHBh Z2UtPm1hcHBpbmc7DQo+ID4gPiA+IDQ4OCAgICAgDQo+ID4gPiA+IDQ4OSAgICAgICAgIGRmcHJp bnRrKFBBR0VDQUNIRSwgIk5GUzogcmVsZWFzZV9wYWdlKCVwKVxuIiwgcGFnZSk7DQo+ID4gPiA+ IDQ5MCAgICAgDQo+ID4gPiA+IDQ5MSAgICAgICAgIC8qIE9ubHkgZG8gSS9PIGlmIGdmcCBpcyBh IHN1cGVyc2V0IG9mIEdGUF9LRVJORUwgKi8NCj4gPiA+ID4gNDkyICAgICAgICAgaWYgKG1hcHBp bmcgJiYgKGdmcCAmIEdGUF9LRVJORUwpID09IEdGUF9LRVJORUwpIHsNCj4gPiA+ID4gNDkzICAg ICAgICAgICAgICAgICBpbnQgaG93ID0gRkxVU0hfU1lOQzsNCj4gPiA+ID4gNDk0ICAgICAgICAg ICAgIA0KPiA+ID4gPiA0OTUgICAgICAgICAgICAgICAgIC8qIERvbid0IGxldCBrc3dhcGQgZGVh ZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQQyBjYWxscyAqLw0KPiA+ID4gPiA0OTYgICAgICAgICAg ICAgICAgIGlmIChjdXJyZW50X2lzX2tzd2FwZCgpKQ0KPiA+ID4gPiA0OTcgICAgICAgICAgICAg ICAgICAgICAgICAgaG93ID0gMDsNCj4gPiA+ID4gNDk4ICAgICAgICAgICAgICAgICBuZnNfY29t bWl0X2lub2RlKG1hcHBpbmctPmhvc3QsIGhvdyk7DQo+ID4gPiA+IDQ5OSAgICAgICAgIH0NCj4g PiA+ID4gNTAwICAgICAgICAgLyogSWYgUGFnZVByaXZhdGUoKSBpcyBzZXQsIHRoZW4gdGhlIHBh Z2UgaXMgbm90IGZyZWVhYmxlICovDQo+ID4gPiA+IDUwMSAgICAgICAgIGlmIChQYWdlUHJpdmF0 ZShwYWdlKSkNCj4gPiA+ID4gNTAyICAgICAgICAgICAgICAgICByZXR1cm4gMDsNCj4gPiA+ID4g NTAzICAgICAgICAgcmV0dXJuIG5mc19mc2NhY2hlX3JlbGVhc2VfcGFnZShwYWdlLCBnZnApOw0K PiA+ID4gPiA1MDQgfQ0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gDQo+ID4gPiA+IEFub3RoZXIgb3B0 aW9uIGlzIHRvIGNoYW5nZSB0aGUgcHJpb3JpdHkgb2YgdGhlIG1lbW9yeSBhbGxvY2F0aW9uOg0K PiA+ID4gPiBuZXQvaXB2NC9hZl9pbmV0LmMNCj4gPiA+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gPiA+ICAyNjUgc3Rh dGljIGludCBpbmV0X2NyZWF0ZShzdHJ1Y3QgbmV0ICpuZXQsIHN0cnVjdCBzb2NrZXQgKnNvY2ss IGludA0KPiA+ID4gPiBwcm90b2NvbCwNCj4gPiA+ID4gIDI2NiAgICAgICAgICAgICAgICAgICAg ICAgIGludCBrZXJuKQ0KPiA+ID4gPiAgMjY3IHsNCj4gPiA+ID4gLi4uDQo+ID4gPiA+ICAzNDUg ICAgICAgICBzayA9IHNrX2FsbG9jKG5ldCwgUEZfSU5FVCwgR0ZQX0tFUk5FTCwgYW5zd2VyX3By b3QpOw0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gQ29uc2lkZXJpbmcgdGhpcyBpcyBnZW5lcmljIG5l dCBjb2RlLCBJIGFzc3VtZSB0aGUgR0ZQX0tFUk5FTCB3aWxsIG5vdA0KPiA+ID4gPiBiZSByZXBs YWNlZCB3aXRoIEdGUF9BVE9NSUMuDQo+ID4gPiA+IA0KPiA+ID4gPiBOT1RFLCB0aGlzIGlzIG9u IFJIRUwgNi4xIGtlcm5lbCAyLjYuMzItMTMxLjYuMSBhbmQgYXBwYXJlbnRseSB1c2VzDQo+ID4g PiA+ICdsZWdhY3knIHdvcmtxdWV1ZSBjb2RlLg0KPiA+ID4gPiANCj4gPiA+ID4gY3lhLA0KPiA+ ID4gPiAjDQo+ID4gPiA+IA0KPiA+ID4gDQo+ID4gPiBJIHN1c3BlY3QgdGhpcyBpcyBhbHNvIGEg cHJvYmxlbSBpbiBtYWlubGluZSwgYnV0IG1heWJlIHNvbWUgb2YgdGhlDQo+ID4gPiByZWNlbnQg d3JpdGViYWNrIGNoYW5nZXMgcHJldmVudCBpdC4uLg0KPiA+ID4gDQo+ID4gPiBJIHRoaW5rIHRo ZSByaWdodCBzb2x1dGlvbiBoZXJlIGlzIHRvIG1ha2UgbmZzX3JlbGVhc2VfcGFnZSB0cmVhdCBy cGNpb2QNCj4gPiA+IHNpbWlsYXJseSB0byBrc3dhcGQuIEVhc2llciBzYWlkIHRoYW4gZG9uZSB0 aG91Z2ggLS0geW91J2xsIG5lZWQgdG8NCj4gPiA+IGNvbWUgdXAgd2l0aCBhIHdheSB0byBkZXRl cm1pbmUgaWYgeW91J3JlIHJ1bm5pbmcgaW4gcnBjaW9kIGNvbnRleHQuLi4NCj4gPiANCj4gPiBO by4gVGhlIF9yaWdodF8gc29sdXRpb24gaXMgdG8gZW5zdXJlIHRoYXQgcnBjaW9kIGRvZXNuJ3Qg ZG8gYWxsb2NhdGlvbnMNCj4gPiB0aGF0IHJlc3VsdCBpbiBhIHBhZ2UgcmVjbGFpbS4uLiB0cnlf dG9fcmVsZWFzZV9wYWdlKCkgaXMganVzdCB0aGUgdGlwDQo+ID4gb2YgdGhlIGljZWJlcmcgb2Yg Y3JhenkgZGVhZGxvY2tzIHRoYXQgdGhpcyBzb2NrZXQgYWxsb2NhdGlvbiBjYW4gZ2V0IHVzDQo+ ID4gaW50by4NCj4gPiANCj4gPiBVbmZvcnR1bmF0ZWx5LCBzZWxpbnV4ICYgY28uIHByZXZlbnQg dXMgZnJvbSBhbGxvY2F0aW5nIHRoZSBzb2NrZXRzIGluDQo+ID4gdXNlciBjb250ZXh0cywgYW5k IGFueXdheSwgaGF2aW5nIHRvIHdhaXQgZm9yIGFub3RoZXIgdGhyZWFkIHRvIGRvIHRoZQ0KPiA+ IHNhbWUgYWxsb2NhdGlvbiBpc24ndCBkb2luZyB0byBoZWxwIHByZXZlbnQgdGhlIGRlYWRsb2Nr Li4uDQo+ID4gDQo+ID4gSSBrbm93IHRoYXQgTWVsIEdvcm1hbidzIE5GUyBzd2FwIHBhdGNoZXMg aGFkIHNvbWUgcHJvdGVjdGlvbnMgYWdhaW5zdA0KPiA+IHRoaXMgc29ydCBvZiBkZWFkbG9jay4g UGVyaGFwcyB3ZSBjYW4gbG9vayBhdCBob3cgaGUgd2FzIGRvaW5nIHRoaXM/DQo+ID4gDQo+IA0K PiBBY3R1YWxseS4uLm5vdyB0aGF0IEkndmUgbG9va2VkIGF0IHRoaXMsIEkgdGhpbmsgdGhlIHJp Z2h0IHNvbHV0aW9uDQo+IGhlcmUgaXMgdG8gc3RvcCBjYWxsaW5nIHNvY2tfcmVsZWFzZSgpIG9u IHRoZXNlIHNvY2tldHMgdW50aWwgd2UncmUgaW4NCj4geHNfZGVzdHJveS4gSU9XLCBkb24ndCBm cmVlIHRoZSBzb2NrZXQgYWx0b2dldGhlciB1bmxlc3Mgd2UncmUgdHJ1bHkNCj4gdGVhcmluZyBk b3duIHRoZSBjb25uZWN0aW9uIGZvciBnb29kLiBUaGF0IG1lYW5zIHRob3VnaCB0aGF0IHdlIG5l ZWQgdG8NCj4gY29udmVydCB0aGUgcGxhY2VzIHRoYXQgY3VycmVudGx5IGNhbGwgcnBjX3hwcnRf b3BzLT5jbG9zZSB0byBjYWxsDQo+IHNvbWV0aGluZyBsaWtlIGEgKG5ldykgb3BzLT5zaHV0ZG93 biByb3V0aW5lIGluc3RlYWQsIHdpdGggdGhlDQo+IGFzc3VtcHRpb24gb2YgY291cnNlIHRoYXQg d2UgY2FuIHNvbWVob3cgcmVzZXQgYW5kIHJldXNlIHRoZSBzb2NrZXQNCj4gYWZ0ZXJ3YXJkLg0K PiANCj4gVGhlIHF1ZXN0aW9uIGlzIGhvdyBiZXN0IHRvIGVtdWxhdGUgdGhlIGVmZmVjdCBvZiAt PmNsb3NlIHdpdGhvdXQNCj4gYWN0dWFsbHkgZnJlZWluZyB0aGUgc29ja2V0LiBJcyBpdCBzdWZm aWNpZW50IHRvIGNhbGwgc29tZXRoaW5nIGxpa2UNCj4ga2VybmVsX3NvY2tfc2h1dGRvd24oKSBv biBpdCwgYW5kIHR3aWRkbGUgYWxsIG9mIHRoZSBiaXRzIHRvIGVuc3VyZQ0KPiB0aGF0IHRoZSBj b25uZWN0X3dvcmtlciB3aWxsIGdldCBjYWxsZWQgYWZ0ZXJ3YXJkPw0KDQpUaGUgcmVhc29uIHdo eSB3ZSBjbG9zZSB0aGVzZSBzb2NrZXRzIGlzIHRoYXQgaWYgdGhlIGF0dGVtcHQgYXQgYWJvcnRp bmcNCnRoZSBjb25uZWN0aW9uIGZhaWxzLCB0aGVuIHRoZXkgdHlwaWNhbGx5IGVuZCB1cCBpbiBh IFRJTUVfV0FJVCBzdGF0ZS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGll bnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cu bmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 18:43 ` Myklebust, Trond @ 2012-06-27 19:28 ` Jeff Layton 2012-06-27 19:37 ` Myklebust, Trond 2012-06-27 19:56 ` Myklebust, Trond 0 siblings, 2 replies; 23+ messages in thread From: Jeff Layton @ 2012-06-27 19:28 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs On Wed, 27 Jun 2012 18:43:56 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Wed, 2012-06-27 at 11:54 -0400, Jeff Layton wrote: > > On Fri, 15 Jun 2012 21:25:10 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote: > > > > On Fri, 15 Jun 2012 22:54:10 +1000 > > > > Harshula <harshula@redhat.com> wrote: > > > > > > > > > Hi All, > > > > > > > > > > Can I please get your feedback on the following? > > > > > > > > > > rpciod is blocked: > > > > > ------------------------------------------------------------ > > > > > crash> bt 2507 > > > > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > > > > > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > > > > > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > > > > > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > > > > > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > > > > > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > > > > > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > > > > > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > > > > > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > > > > > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > > > > > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > > > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > > > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > > > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > > > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > > > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > > > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > > > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > > > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > > > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > > > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > > > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > > > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > > > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > > > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > > > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > > > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > > > > ------------------------------------------------------------ > > > > > > > > > > nfs_release_page() gives kswapd process an exemption from being blocked. > > > > > Should we do the same for rpciod. Otherwise rpciod could end up in a > > > > > deadlock where it can not continue without freeing memory that will only > > > > > become available when rpciod does its work: > > > > > ------------------------------------------------------------ > > > > > 479 /* > > > > > 480 * Attempt to release the private state associated with a page > > > > > 481 * - Called if either PG_private or PG_fscache is set on the page > > > > > 482 * - Caller holds page lock > > > > > 483 * - Return true (may release page) or false (may not) > > > > > 484 */ > > > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp) > > > > > 486 { > > > > > 487 struct address_space *mapping = page->mapping; > > > > > 488 > > > > > 489 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > > > > 490 > > > > > 491 /* Only do I/O if gfp is a superset of GFP_KERNEL */ > > > > > 492 if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > > > > > 493 int how = FLUSH_SYNC; > > > > > 494 > > > > > 495 /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > > > > 496 if (current_is_kswapd()) > > > > > 497 how = 0; > > > > > 498 nfs_commit_inode(mapping->host, how); > > > > > 499 } > > > > > 500 /* If PagePrivate() is set, then the page is not freeable */ > > > > > 501 if (PagePrivate(page)) > > > > > 502 return 0; > > > > > 503 return nfs_fscache_release_page(page, gfp); > > > > > 504 } > > > > > ------------------------------------------------------------ > > > > > > > > > > Another option is to change the priority of the memory allocation: > > > > > net/ipv4/af_inet.c > > > > > ------------------------------------------------------------ > > > > > 265 static int inet_create(struct net *net, struct socket *sock, int > > > > > protocol, > > > > > 266 int kern) > > > > > 267 { > > > > > ... > > > > > 345 sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot); > > > > > ------------------------------------------------------------ > > > > > Considering this is generic net code, I assume the GFP_KERNEL will not > > > > > be replaced with GFP_ATOMIC. > > > > > > > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses > > > > > 'legacy' workqueue code. > > > > > > > > > > cya, > > > > > # > > > > > > > > > > > > > I suspect this is also a problem in mainline, but maybe some of the > > > > recent writeback changes prevent it... > > > > > > > > I think the right solution here is to make nfs_release_page treat rpciod > > > > similarly to kswapd. Easier said than done though -- you'll need to > > > > come up with a way to determine if you're running in rpciod context... > > > > > > No. The _right_ solution is to ensure that rpciod doesn't do allocations > > > that result in a page reclaim... try_to_release_page() is just the tip > > > of the iceberg of crazy deadlocks that this socket allocation can get us > > > into. > > > > > > Unfortunately, selinux & co. prevent us from allocating the sockets in > > > user contexts, and anyway, having to wait for another thread to do the > > > same allocation isn't doing to help prevent the deadlock... > > > > > > I know that Mel Gorman's NFS swap patches had some protections against > > > this sort of deadlock. Perhaps we can look at how he was doing this? > > > > > > > Actually...now that I've looked at this, I think the right solution > > here is to stop calling sock_release() on these sockets until we're in > > xs_destroy. IOW, don't free the socket altogether unless we're truly > > tearing down the connection for good. That means though that we need to > > convert the places that currently call rpc_xprt_ops->close to call > > something like a (new) ops->shutdown routine instead, with the > > assumption of course that we can somehow reset and reuse the socket > > afterward. > > > > The question is how best to emulate the effect of ->close without > > actually freeing the socket. Is it sufficient to call something like > > kernel_sock_shutdown() on it, and twiddle all of the bits to ensure > > that the connect_worker will get called afterward? > > The reason why we close these sockets is that if the attempt at aborting > the connection fails, then they typically end up in a TIME_WAIT state. > I'm still trying to wade through the xprtsock.c socket handling code, but it looks like we currently tear down the connection in 3 different ways: xs_close: which basically calls sock_release and gets rid of our reference to an existing socket. Most of the places where we disconnect the socket use this. After this, we end up with srcport == 0 which makes it pick a new port. xs_tcp_shutdown: which calls ->shutdown on it, but doesn't free anything. This also preserves the existing srcport. xs_abort_connection: calls kernel_connect to reconnect the socket to AF_UNSPEC address (effectively disconnecting it?). This also preserves the srcport. I guess we use this just before reconnecting when the remote end drops the connection, since we don't need to be graceful about tearing anything down at that point. The last one actually does reuse the same socket, so my thinking was that we could extend that scheme to the other cases. If we called ->shutdown on it and then reconnected it to AF_UNSPEC, would that "reset" it back to a usable state? If there really is no alternative to freeing the socket, then the only real fix I can see is to set PF_MEMALLOC when we go to create it and then reset it afterward. That's a pretty ugly fix though... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 19:28 ` Jeff Layton @ 2012-06-27 19:37 ` Myklebust, Trond 2012-06-27 20:18 ` Jeff Layton 2012-06-27 19:56 ` Myklebust, Trond 1 sibling, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-06-27 19:37 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBUaGUgcmVhc29uIHdoeSB3 ZSBjbG9zZSB0aGVzZSBzb2NrZXRzIGlzIHRoYXQgaWYgdGhlIGF0dGVtcHQgYXQgYWJvcnRpbmcN Cj4gPiB0aGUgY29ubmVjdGlvbiBmYWlscywgdGhlbiB0aGV5IHR5cGljYWxseSBlbmQgdXAgaW4g YSBUSU1FX1dBSVQgc3RhdGUuDQo+ID4gDQo+IA0KPiBJJ20gc3RpbGwgdHJ5aW5nIHRvIHdhZGUg dGhyb3VnaCB0aGUgeHBydHNvY2suYyBzb2NrZXQgaGFuZGxpbmcgY29kZSwNCj4gYnV0IGl0IGxv b2tzIGxpa2Ugd2UgY3VycmVudGx5IHRlYXIgZG93biB0aGUgY29ubmVjdGlvbiBpbiAzIGRpZmZl cmVudA0KPiB3YXlzOg0KPiANCj4geHNfY2xvc2U6IHdoaWNoIGJhc2ljYWxseSBjYWxscyBzb2Nr X3JlbGVhc2UgYW5kIGdldHMgcmlkIG9mIG91cg0KPiByZWZlcmVuY2UgdG8gYW4gZXhpc3Rpbmcg c29ja2V0LiBNb3N0IG9mIHRoZSBwbGFjZXMgd2hlcmUgd2UgZGlzY29ubmVjdA0KPiB0aGUgc29j a2V0IHVzZSB0aGlzLiBBZnRlciB0aGlzLCB3ZSBlbmQgdXAgd2l0aCBzcmNwb3J0ID09IDAgd2hp Y2gNCj4gbWFrZXMgaXQgcGljayBhIG5ldyBwb3J0Lg0KPiANCj4geHNfdGNwX3NodXRkb3duOiB3 aGljaCBjYWxscyAtPnNodXRkb3duIG9uIGl0LCBidXQgZG9lc24ndCBmcmVlDQo+IGFueXRoaW5n LiBUaGlzIGFsc28gcHJlc2VydmVzIHRoZSBleGlzdGluZyBzcmNwb3J0Lg0KPiANCj4geHNfYWJv cnRfY29ubmVjdGlvbjogY2FsbHMga2VybmVsX2Nvbm5lY3QgdG8gcmVjb25uZWN0IHRoZSBzb2Nr ZXQgdG8NCj4gQUZfVU5TUEVDIGFkZHJlc3MgKGVmZmVjdGl2ZWx5IGRpc2Nvbm5lY3RpbmcgaXQ/ KS4gVGhpcyBhbHNvIHByZXNlcnZlcw0KPiB0aGUgc3JjcG9ydC4gSSBndWVzcyB3ZSB1c2UgdGhp cyBqdXN0IGJlZm9yZSByZWNvbm5lY3Rpbmcgd2hlbiB0aGUNCj4gcmVtb3RlIGVuZCBkcm9wcyB0 aGUgY29ubmVjdGlvbiwgc2luY2Ugd2UgZG9uJ3QgbmVlZCB0byBiZSBncmFjZWZ1bA0KPiBhYm91 dCB0ZWFyaW5nIGFueXRoaW5nIGRvd24gYXQgdGhhdCBwb2ludC4NCj4gDQo+IFRoZSBsYXN0IG9u ZSBhY3R1YWxseSBkb2VzIHJldXNlIHRoZSBzYW1lIHNvY2tldCwgc28gbXkgdGhpbmtpbmcgd2Fz DQo+IHRoYXQgd2UgY291bGQgZXh0ZW5kIHRoYXQgc2NoZW1lIHRvIHRoZSBvdGhlciBjYXNlcy4g SWYgd2UgY2FsbGVkDQo+IC0+c2h1dGRvd24gb24gaXQgYW5kIHRoZW4gcmVjb25uZWN0ZWQgaXQg dG8gQUZfVU5TUEVDLCB3b3VsZCB0aGF0DQo+ICJyZXNldCIgaXQgYmFjayB0byBhIHVzYWJsZSBz dGF0ZT8NCg0KTm90IHRoYXQgSSdtIGF3YXJlIG9mLiBUaGUgcHJvYmxlbSBpcyB0aGF0IG1vc3Qg b2YgdGhpcyBzdHVmZiBpcw0KdW5kb2N1bWVudGVkLiBGb3IgaW5zdGFuY2UsIHRoZSBBRl9VTlNQ RUMgcmVjb25uZWN0IGlzIGRvY3VtZW50ZWQgb25seQ0KZm9yIFVEUCBjb25uZWN0aW9ucy4gV2hp bGUgTGludXggaW1wbGVtZW50cyBpdCBmb3IgVENQIHRvbywgdGhlcmUgaXMgbm8NCnNwZWMgKHRo YXQgSSdtIGF3YXJlIG9mKSB0aGF0IGV4cGxhaW5zIGhvdyB0aGF0IHNob3VsZCB3b3JrLg0KDQo+ IElmIHRoZXJlIHJlYWxseSBpcyBubyBhbHRlcm5hdGl2ZSB0byBmcmVlaW5nIHRoZSBzb2NrZXQs IHRoZW4gdGhlIG9ubHkNCj4gcmVhbCBmaXggSSBjYW4gc2VlIGlzIHRvIHNldCBQRl9NRU1BTExP QyB3aGVuIHdlIGdvIHRvIGNyZWF0ZSBpdCBhbmQNCj4gdGhlbiByZXNldCBpdCBhZnRlcndhcmQu IFRoYXQncyBhIHByZXR0eSB1Z2x5IGZpeCB0aG91Z2guLi4NCg0KQWdyZWVkLi4uDQoNCi0tIA0K VHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpU cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 19:37 ` Myklebust, Trond @ 2012-06-27 20:18 ` Jeff Layton 0 siblings, 0 replies; 23+ messages in thread From: Jeff Layton @ 2012-06-27 20:18 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs On Wed, 27 Jun 2012 19:37:03 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote: > > On Wed, 27 Jun 2012 18:43:56 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > The reason why we close these sockets is that if the attempt at aborting > > > the connection fails, then they typically end up in a TIME_WAIT state. > > > > > > > I'm still trying to wade through the xprtsock.c socket handling code, > > but it looks like we currently tear down the connection in 3 different > > ways: > > > > xs_close: which basically calls sock_release and gets rid of our > > reference to an existing socket. Most of the places where we disconnect > > the socket use this. After this, we end up with srcport == 0 which > > makes it pick a new port. > > > > xs_tcp_shutdown: which calls ->shutdown on it, but doesn't free > > anything. This also preserves the existing srcport. > > > > xs_abort_connection: calls kernel_connect to reconnect the socket to > > AF_UNSPEC address (effectively disconnecting it?). This also preserves > > the srcport. I guess we use this just before reconnecting when the > > remote end drops the connection, since we don't need to be graceful > > about tearing anything down at that point. > > > > The last one actually does reuse the same socket, so my thinking was > > that we could extend that scheme to the other cases. If we called > > ->shutdown on it and then reconnected it to AF_UNSPEC, would that > > "reset" it back to a usable state? > > Not that I'm aware of. The problem is that most of this stuff is > undocumented. For instance, the AF_UNSPEC reconnect is documented only > for UDP connections. While Linux implements it for TCP too, there is no > spec (that I'm aware of) that explains how that should work. > ...and fwiw, it looks like reconnecting a TCP socket to an AF_UNSPEC address doesn't work from userland -- you get back EINVAL. I have to wonder if xs_abort_connection actually works as expected... > > If there really is no alternative to freeing the socket, then the only > > real fix I can see is to set PF_MEMALLOC when we go to create it and > > then reset it afterward. That's a pretty ugly fix though... > > Agreed... > That looks basically like what Mel is doing to work around the problem, though he only does it for xprt's that are tagged as being swapped over. We could just make that unconditional, but the "congested" flag scheme sounds better. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 19:28 ` Jeff Layton 2012-06-27 19:37 ` Myklebust, Trond @ 2012-06-27 19:56 ` Myklebust, Trond 2012-06-27 20:19 ` Jeff Layton 1 sibling, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-06-27 19:56 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gSWYgdGhlcmUgcmVhbGx5IGlz IG5vIGFsdGVybmF0aXZlIHRvIGZyZWVpbmcgdGhlIHNvY2tldCwgdGhlbiB0aGUgb25seQ0KPiBy ZWFsIGZpeCBJIGNhbiBzZWUgaXMgdG8gc2V0IFBGX01FTUFMTE9DIHdoZW4gd2UgZ28gdG8gY3Jl YXRlIGl0IGFuZA0KPiB0aGVuIHJlc2V0IGl0IGFmdGVyd2FyZC4gVGhhdCdzIGEgcHJldHR5IHVn bHkgZml4IHRob3VnaC4uLg0KDQpJIGNhbiB0aGluayBvZiAyIHBvc3NpYmxlIGFsdGVybmF0aXZl czoNCg0KICAgICAxLiBVc2UgdGhlIFBGX0ZTVFJBTlMgZmxhZyB0byB0ZWxsIG5mc19yZWxlYXNl X3BhZ2UgdGhhdCB0aGlzIGlzIGENCiAgICAgICAgZGlyZWN0IHJlY2xhaW0gZnJvbSBycGNpb2Qu DQogICAgIDIuIEFkZCBhICdjb25nZXN0ZWQnIGZsYWcgdG8gdGhlIHJwYyBjbGllbnQgdGhhdCBj b3VsZCB0ZWxsIHJlY2xhaW0NCiAgICAgICAgdHlwZSBvcGVyYXRpb25zIHRvIGdpdmUgdXAgaWYg YSBzb2NrZXQgYWxsb2NhdGlvbiBpcyBpbg0KICAgICAgICBwcm9ncmVzcy4NCg0KLS0gDQpUcm9u ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 19:56 ` Myklebust, Trond @ 2012-06-27 20:19 ` Jeff Layton 2012-06-27 20:46 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-27 20:19 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs On Wed, 27 Jun 2012 19:56:38 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote: > > On Wed, 27 Jun 2012 18:43:56 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > If there really is no alternative to freeing the socket, then the only > > real fix I can see is to set PF_MEMALLOC when we go to create it and > > then reset it afterward. That's a pretty ugly fix though... > > I can think of 2 possible alternatives: > > 1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a > direct reclaim from rpciod. Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses that flag now... Suppose we're in some XFS allocation and then want to commit a NFS page to satisfy it. There's no real reason we couldn't but that would prevent it... > 2. Add a 'congested' flag to the rpc client that could tell reclaim > type operations to give up if a socket allocation is in > progress. > That's not a bad idea. That has the benefit of only skipping reclaim when it's directly associated with the socket being reconnected. Seems like that would have to hang off of the rpc_xprt, but we could do something along those lines. Presumably Mel's swap-over-nfs patches could be converted to do something similar instead of using PF_MEMALLOC there too. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 20:19 ` Jeff Layton @ 2012-06-27 20:46 ` Myklebust, Trond 2012-06-28 13:19 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-06-27 20:46 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE2OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gV2VkLCAyNyBKdW4gMjAxMiAxOTo1NjozOCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gV2VkLCAyMDEy LTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFdlZCwg MjcgSnVuIDIwMTIgMTg6NDM6NTYgKzAwMDANCj4gPiA+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJv bmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gSWYgdGhlcmUgcmVhbGx5IGlz IG5vIGFsdGVybmF0aXZlIHRvIGZyZWVpbmcgdGhlIHNvY2tldCwgdGhlbiB0aGUgb25seQ0KPiA+ ID4gcmVhbCBmaXggSSBjYW4gc2VlIGlzIHRvIHNldCBQRl9NRU1BTExPQyB3aGVuIHdlIGdvIHRv IGNyZWF0ZSBpdCBhbmQNCj4gPiA+IHRoZW4gcmVzZXQgaXQgYWZ0ZXJ3YXJkLiBUaGF0J3MgYSBw cmV0dHkgdWdseSBmaXggdGhvdWdoLi4uDQo+ID4gDQo+ID4gSSBjYW4gdGhpbmsgb2YgMiBwb3Nz aWJsZSBhbHRlcm5hdGl2ZXM6DQo+ID4gDQo+ID4gICAgICAxLiBVc2UgdGhlIFBGX0ZTVFJBTlMg ZmxhZyB0byB0ZWxsIG5mc19yZWxlYXNlX3BhZ2UgdGhhdCB0aGlzIGlzIGENCj4gPiAgICAgICAg IGRpcmVjdCByZWNsYWltIGZyb20gcnBjaW9kLg0KPiANCj4gSG1tLi4udGhhdCdzIGEgYml0IGlu ZGlzY3JpbWluYXRlLCBpc24ndCBpdD8gTG9va3MgbGlrZSBvbmx5IFhGUyB1c2VzDQo+IHRoYXQg ZmxhZyBub3cuLi4NCj4gDQo+IFN1cHBvc2Ugd2UncmUgaW4gc29tZSBYRlMgYWxsb2NhdGlvbiBh bmQgdGhlbiB3YW50IHRvIGNvbW1pdCBhIE5GUw0KPiBwYWdlIHRvIHNhdGlzZnkgaXQuIFRoZXJl J3Mgbm8gcmVhbCByZWFzb24gd2UgY291bGRuJ3QgYnV0IHRoYXQgd291bGQNCj4gcHJldmVudCBp dC4uLg0KDQpTbyBkbyBhbGwgdGhlIEdGUF9OT0ZTIGFsbG9jYXRpb25zLiBIb3cgaXMgdGhpcyBh bnkgZGlmZmVyZW50Pw0KDQo+ID4gICAgICAyLiBBZGQgYSAnY29uZ2VzdGVkJyBmbGFnIHRvIHRo ZSBycGMgY2xpZW50IHRoYXQgY291bGQgdGVsbCByZWNsYWltDQo+ID4gICAgICAgICB0eXBlIG9w ZXJhdGlvbnMgdG8gZ2l2ZSB1cCBpZiBhIHNvY2tldCBhbGxvY2F0aW9uIGlzIGluDQo+ID4gICAg ICAgICBwcm9ncmVzcy4NCj4gPiANCj4gDQo+IFRoYXQncyBub3QgYSBiYWQgaWRlYS4gVGhhdCBo YXMgdGhlIGJlbmVmaXQgb2Ygb25seSBza2lwcGluZyByZWNsYWltDQo+IHdoZW4gaXQncyBkaXJl Y3RseSBhc3NvY2lhdGVkIHdpdGggdGhlIHNvY2tldCBiZWluZyByZWNvbm5lY3RlZC4gU2VlbXMN Cj4gbGlrZSB0aGF0IHdvdWxkIGhhdmUgdG8gaGFuZyBvZmYgb2YgdGhlIHJwY194cHJ0LCBidXQg d2UgY291bGQgZG8NCj4gc29tZXRoaW5nIGFsb25nIHRob3NlIGxpbmVzLiBQcmVzdW1hYmx5IE1l bCdzIHN3YXAtb3Zlci1uZnMgcGF0Y2hlcw0KPiBjb3VsZCBiZSBjb252ZXJ0ZWQgdG8gZG8gc29t ZXRoaW5nIHNpbWlsYXIgaW5zdGVhZCBvZiB1c2luZyBQRl9NRU1BTExPQw0KPiB0aGVyZSB0b28u DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0K TmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-27 20:46 ` Myklebust, Trond @ 2012-06-28 13:19 ` Jeff Layton 2012-06-28 13:40 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-28 13:19 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman On Wed, 27 Jun 2012 20:46:49 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Wed, 2012-06-27 at 16:19 -0400, Jeff Layton wrote: > > On Wed, 27 Jun 2012 19:56:38 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote: > > > > On Wed, 27 Jun 2012 18:43:56 +0000 > > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > If there really is no alternative to freeing the socket, then the only > > > > real fix I can see is to set PF_MEMALLOC when we go to create it and > > > > then reset it afterward. That's a pretty ugly fix though... > > > > > > I can think of 2 possible alternatives: > > > > > > 1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a > > > direct reclaim from rpciod. > > > > Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses > > that flag now... > > > > Suppose we're in some XFS allocation and then want to commit a NFS > > page to satisfy it. There's no real reason we couldn't but that would > > prevent it... > > So do all the GFP_NOFS allocations. How is this any different? > Good point. Now that I've looked closer, I think this is probably the best approach. Proposed patch below. It builds but is otherwise untested so far... > > > 2. Add a 'congested' flag to the rpc client that could tell reclaim > > > type operations to give up if a socket allocation is in > > > progress. > > > > > > > That's not a bad idea. That has the benefit of only skipping reclaim > > when it's directly associated with the socket being reconnected. Seems > > like that would have to hang off of the rpc_xprt, but we could do > > something along those lines. Presumably Mel's swap-over-nfs patches > > could be converted to do something similar instead of using PF_MEMALLOC > > there too. > The problem with the above scheme is that it would be racy. It would easily be possible for the congested flag to flip just after you checked it but before issuing the commit. Maybe we could push handling for that down into the commit call itself -- allow it to give up if "congested" goes true while we're blocked and waiting for the the reply, but that sounds complicated... Anyway, here's a proposed patch for the PF_FSTRANS scheme. It builds but I haven't otherwise tested it yet. For now, it just covers the socket allocation and ->releasepage. Eventually I guess we'll probably want to set and check for PF_FSTRANS in other places: -----------------[snip]------------------ [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca The problem is pretty clear. rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a blocking commit call. Also, cc'ing Mel since I borrowed his tsk_restore_flags helper here, and this patch may cause merge conflicts with some of his Swap-over-NFS patches. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/file.c | 8 ++++++-- include/linux/sched.h | 7 +++++++ net/sunrpc/xprtrdma/transport.c | 4 +++- net/sunrpc/xprtsock.c | 13 +++++++++++++ 4 files changed, 29 insertions(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a6708e6b..80cc621 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -463,8 +463,12 @@ static int nfs_release_page(struct page *page, gfp_t gfp) if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { int how = FLUSH_SYNC; - /* Don't let kswapd deadlock waiting for OOM RPC calls */ - if (current_is_kswapd()) + /* + * Don't let kswapd deadlock waiting for OOM RPC calls, and + * don't recurse back into the fs if we know that we're trying + * to reclaim memory for an fs-related allocation. + */ + if (current_is_kswapd() || current->flags & PF_FSTRANS) how = 0; nfs_commit_inode(mapping->host, how); } diff --git a/include/linux/sched.h b/include/linux/sched.h index 4059c0f..d71b523 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1889,6 +1889,13 @@ static inline void rcu_switch_from(struct task_struct *prev) #endif +static inline void tsk_restore_flags(struct task_struct *p, + unsigned long pflags, unsigned long mask) +{ + p->flags &= ~mask; + p->flags |= pflags & mask; +} + #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index b446e10..7b0f74b 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -197,9 +197,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rdma_connect.work); struct rpc_xprt *xprt = &r_xprt->xprt; + unsigned long pflags = current->flags; int rc = 0; if (!xprt->shutdown) { + current->flags |= PF_FSTRANS; xprt_clear_connected(xprt); dprintk("RPC: %s: %sconnect\n", __func__, @@ -212,10 +214,10 @@ xprt_rdma_connect_worker(struct work_struct *work) out: xprt_wake_pending_tasks(xprt, rc); - out_clear: dprintk("RPC: %s: exit\n", __func__); xprt_clear_connecting(xprt); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /* diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 890b03f..6c7d8d5 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1890,11 +1890,14 @@ static void xs_local_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); status = __sock_create(xprt->xprt_net, AF_LOCAL, SOCK_STREAM, 0, &sock, 1); @@ -1928,6 +1931,7 @@ static void xs_local_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) @@ -1965,11 +1969,14 @@ static void xs_udp_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct rpc_xprt *xprt = &transport->xprt; struct socket *sock = transport->sock; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + /* Start by resetting any existing state */ xs_reset_transport(transport); sock = xs_create_sock(xprt, transport, @@ -1988,6 +1995,7 @@ static void xs_udp_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /* @@ -2108,11 +2116,14 @@ static void xs_tcp_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + if (!sock) { clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); sock = xs_create_sock(xprt, transport, @@ -2162,6 +2173,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -EINPROGRESS: case -EALREADY: xprt_clear_connecting(xprt); + tsk_restore_flags(current, pflags, PF_FSTRANS); return; case -EINVAL: /* Happens, for instance, if the user specified a link @@ -2174,6 +2186,7 @@ out_eagain: out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /** -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-28 13:19 ` Jeff Layton @ 2012-06-28 13:40 ` Myklebust, Trond 2012-06-28 14:47 ` Jeff Layton 2012-06-28 14:51 ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton 0 siblings, 2 replies; 23+ messages in thread From: Myklebust, Trond @ 2012-06-28 13:40 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, linux-nfs, mgorman T24gVGh1LCAyMDEyLTA2LTI4IGF0IDA5OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g T24gV2VkLCAyNyBKdW4gMjAxMiAyMDo0Njo0OSArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gV2VkLCAyMDEy LTA2LTI3IGF0IDE2OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFdlZCwg MjcgSnVuIDIwMTIgMTk6NTY6MzggKzAwMDANCj4gPiA+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJv bmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gDQo+ID4gPiA+IE9uIFdlZCwg MjAxMi0wNi0yNyBhdCAxNToyOCAtMDQwMCwgSmVmZiBMYXl0b24gd3JvdGU6DQo+ID4gPiA+ID4g T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiA+ID4gPiA+ICJNeWtsZWJ1c3Qs IFRyb25kIiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gPiA+IElm IHRoZXJlIHJlYWxseSBpcyBubyBhbHRlcm5hdGl2ZSB0byBmcmVlaW5nIHRoZSBzb2NrZXQsIHRo ZW4gdGhlIG9ubHkNCj4gPiA+ID4gPiByZWFsIGZpeCBJIGNhbiBzZWUgaXMgdG8gc2V0IFBGX01F TUFMTE9DIHdoZW4gd2UgZ28gdG8gY3JlYXRlIGl0IGFuZA0KPiA+ID4gPiA+IHRoZW4gcmVzZXQg aXQgYWZ0ZXJ3YXJkLiBUaGF0J3MgYSBwcmV0dHkgdWdseSBmaXggdGhvdWdoLi4uDQo+ID4gPiA+ IA0KPiA+ID4gPiBJIGNhbiB0aGluayBvZiAyIHBvc3NpYmxlIGFsdGVybmF0aXZlczoNCj4gPiA+ ID4gDQo+ID4gPiA+ICAgICAgMS4gVXNlIHRoZSBQRl9GU1RSQU5TIGZsYWcgdG8gdGVsbCBuZnNf cmVsZWFzZV9wYWdlIHRoYXQgdGhpcyBpcyBhDQo+ID4gPiA+ICAgICAgICAgZGlyZWN0IHJlY2xh aW0gZnJvbSBycGNpb2QuDQo+ID4gPiANCj4gPiA+IEhtbS4uLnRoYXQncyBhIGJpdCBpbmRpc2Ny aW1pbmF0ZSwgaXNuJ3QgaXQ/IExvb2tzIGxpa2Ugb25seSBYRlMgdXNlcw0KPiA+ID4gdGhhdCBm bGFnIG5vdy4uLg0KPiA+ID4gDQo+ID4gPiBTdXBwb3NlIHdlJ3JlIGluIHNvbWUgWEZTIGFsbG9j YXRpb24gYW5kIHRoZW4gd2FudCB0byBjb21taXQgYSBORlMNCj4gPiA+IHBhZ2UgdG8gc2F0aXNm eSBpdC4gVGhlcmUncyBubyByZWFsIHJlYXNvbiB3ZSBjb3VsZG4ndCBidXQgdGhhdCB3b3VsZA0K PiA+ID4gcHJldmVudCBpdC4uLg0KPiA+IA0KPiA+IFNvIGRvIGFsbCB0aGUgR0ZQX05PRlMgYWxs b2NhdGlvbnMuIEhvdyBpcyB0aGlzIGFueSBkaWZmZXJlbnQ/DQo+ID4gDQo+IA0KPiBHb29kIHBv aW50LiBOb3cgdGhhdCBJJ3ZlIGxvb2tlZCBjbG9zZXIsIEkgdGhpbmsgdGhpcyBpcyBwcm9iYWJs eSB0aGUNCj4gYmVzdCBhcHByb2FjaC4gUHJvcG9zZWQgcGF0Y2ggYmVsb3cuIEl0IGJ1aWxkcyBi dXQgaXMgb3RoZXJ3aXNlDQo+IHVudGVzdGVkIHNvIGZhci4uLg0KPiANCj4gPiA+ID4gICAgICAy LiBBZGQgYSAnY29uZ2VzdGVkJyBmbGFnIHRvIHRoZSBycGMgY2xpZW50IHRoYXQgY291bGQgdGVs bCByZWNsYWltDQo+ID4gPiA+ICAgICAgICAgdHlwZSBvcGVyYXRpb25zIHRvIGdpdmUgdXAgaWYg YSBzb2NrZXQgYWxsb2NhdGlvbiBpcyBpbg0KPiA+ID4gPiAgICAgICAgIHByb2dyZXNzLg0KPiA+ ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhhdCdzIG5vdCBhIGJhZCBpZGVhLiBUaGF0IGhhcyB0aGUg YmVuZWZpdCBvZiBvbmx5IHNraXBwaW5nIHJlY2xhaW0NCj4gPiA+IHdoZW4gaXQncyBkaXJlY3Rs eSBhc3NvY2lhdGVkIHdpdGggdGhlIHNvY2tldCBiZWluZyByZWNvbm5lY3RlZC4gU2VlbXMNCj4g PiA+IGxpa2UgdGhhdCB3b3VsZCBoYXZlIHRvIGhhbmcgb2ZmIG9mIHRoZSBycGNfeHBydCwgYnV0 IHdlIGNvdWxkIGRvDQo+ID4gPiBzb21ldGhpbmcgYWxvbmcgdGhvc2UgbGluZXMuIFByZXN1bWFi bHkgTWVsJ3Mgc3dhcC1vdmVyLW5mcyBwYXRjaGVzDQo+ID4gPiBjb3VsZCBiZSBjb252ZXJ0ZWQg dG8gZG8gc29tZXRoaW5nIHNpbWlsYXIgaW5zdGVhZCBvZiB1c2luZyBQRl9NRU1BTExPQw0KPiA+ ID4gdGhlcmUgdG9vLg0KPiA+IA0KPiANCj4gVGhlIHByb2JsZW0gd2l0aCB0aGUgYWJvdmUgc2No ZW1lIGlzIHRoYXQgaXQgd291bGQgYmUgcmFjeS4gSXQgd291bGQNCj4gZWFzaWx5IGJlIHBvc3Np YmxlIGZvciB0aGUgY29uZ2VzdGVkIGZsYWcgdG8gZmxpcCBqdXN0IGFmdGVyIHlvdQ0KPiBjaGVj a2VkIGl0IGJ1dCBiZWZvcmUgaXNzdWluZyB0aGUgY29tbWl0Lg0KPiANCj4gTWF5YmUgd2UgY291 bGQgcHVzaCBoYW5kbGluZyBmb3IgdGhhdCBkb3duIGludG8gdGhlIGNvbW1pdCBjYWxsIGl0c2Vs Zg0KPiAtLSBhbGxvdyBpdCB0byBnaXZlIHVwIGlmICJjb25nZXN0ZWQiIGdvZXMgdHJ1ZSB3aGls ZSB3ZSdyZSBibG9ja2VkIGFuZA0KPiB3YWl0aW5nIGZvciB0aGUgdGhlIHJlcGx5LCBidXQgdGhh dCBzb3VuZHMgY29tcGxpY2F0ZWQuLi4NCj4gDQo+IEFueXdheSwgaGVyZSdzIGEgcHJvcG9zZWQg cGF0Y2ggZm9yIHRoZSBQRl9GU1RSQU5TIHNjaGVtZS4gSXQgYnVpbGRzDQo+IGJ1dCBJIGhhdmVu J3Qgb3RoZXJ3aXNlIHRlc3RlZCBpdCB5ZXQuIEZvciBub3csIGl0IGp1c3QgY292ZXJzIHRoZQ0K PiBzb2NrZXQgYWxsb2NhdGlvbiBhbmQgLT5yZWxlYXNlcGFnZS4gRXZlbnR1YWxseSBJIGd1ZXNz IHdlJ2xsIHByb2JhYmx5DQo+IHdhbnQgdG8gc2V0IGFuZCBjaGVjayBmb3IgUEZfRlNUUkFOUyBp biBvdGhlciBwbGFjZXM6DQoNCklzIHRoZXJlIGFueSByZWFzb24gd2h5IHdlIG1pZ2h0IG5vdCB3 YW50IGFsbCBycGNpb2Qgd29yayB0byBydW4gdW5kZXINClBGX0ZTVFJBTlM/IElPVzogY291bGRu J3Qgd2UganVzdCBzZXQvY2xlYXIgaXQgdW5jb25kaXRpb25hbGx5IHdoZW4NCmVudGVyaW5nL2V4 aXRpbmcgcnBjX2FzeW5jX3NjaGVkdWxlKCksIGFuZCB4c18qX3NldHVwX3NvY2tldCgpPw0KDQpX ZSBzaG91bGRuJ3QgbmVlZCB0byB3b3JyeSBhYm91dCB0aGUgcmVtYWluaW5nIHJwY2lvZCB3b3Jr IHN0cnVjdHVyZXMgaW4NCnJwY190aW1lb3V0X3VwY2FsbF9xdWV1ZSBhbmQgeHBydF9hdXRvY2xv c2UsIHNpbmNlIHRob3NlIGRvbid0IGV2ZXIgbmVlZA0KdG8gZG8gYW55IGFsbG9jYXRpb25zLg0K DQo+IC0tLS0tLS0tLS0tLS0tLS0tW3NuaXBdLS0tLS0tLS0tLS0tLS0tLS0tDQo+IA0KPiBbUkZD IFBBVENIXSBuZnM6IGRvIG5vbi1ibG9ja2luZyBjb21taXQgaW4gcmVsZWFzZXBhZ2UgaWYgd2Un cmUgYXR0ZW1wdGluZyB0byByZWNvbm5lY3QgdGhlIHNvY2tldA0KPiANCj4gV2UndmUgaGFkIHNv bWUgcmVwb3J0cyBvZiBhIGRlYWRsb2NrIHdoZXJlIHJwY2lvZCBlbmRzIHVwIHdpdGggYSBzdGFj aw0KPiB0cmFjZSBsaWtlIHRoaXM6DQo+IA0KPiAgICAgUElEOiAyNTA3ICAgVEFTSzogZmZmZjg4 MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCINCj4gICAgICAjMCBbZmZm Zjg4MTAzNDNiZjJmMF0gc2NoZWR1bGUgYXQgZmZmZmZmZmY4MTRkYWJkOQ0KPiAgICAgICMxIFtm ZmZmODgxMDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZmZmZmZmZhMDM4ZmMw NCBbbmZzXQ0KPiAgICAgICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRfb25fYml0IGF0IGZm ZmZmZmZmODE0ZGJjMmYNCj4gICAgICAjMyBbZmZmZjg4MTAzNDNiZjQxOF0gb3V0X29mX2xpbmVf d2FpdF9vbl9iaXQgYXQgZmZmZmZmZmY4MTRkYmNkOA0KPiAgICAgICM0IFtmZmZmODgxMDM0M2Jm NDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25mc10NCj4gICAgICAj NSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZmZmZmZmEwMzhiZWY2 IFtuZnNdDQo+ICAgICAgIzYgW2ZmZmY4ODEwMzQzYmY1MjhdIHRyeV90b19yZWxlYXNlX3BhZ2Ug YXQgZmZmZmZmZmY4MTEwYzY3MA0KPiAgICAgICM3IFtmZmZmODgxMDM0M2JmNTM4XSBzaHJpbmtf cGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiAgICAgICM4IFtmZmZmODgx MDM0M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgxMTI2NjM4DQo+ICAg ICAgIzkgW2ZmZmY4ODEwMzQzYmY4MThdIHNocmlua196b25lIGF0IGZmZmZmZmZmODExMjc4OGYN Cj4gICAgICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVfcGFnZXMgYXQgZmZm ZmZmZmY4MTEyN2IxZQ0KPiAgICAgIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0cnlfdG9fZnJlZV9w YWdlcyBhdCBmZmZmZmZmZjgxMTI4MTJmDQo+ICAgICAjMTIgW2ZmZmY4ODEwMzQzYmZhMDhdIF9f YWxsb2NfcGFnZXNfbm9kZW1hc2sgYXQgZmZmZmZmZmY4MTExZmRhZA0KPiAgICAgIzEzIFtmZmZm ODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZmODExNTk5NDINCj4gICAgICMx NCBbZmZmZjg4MTAzNDNiZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YTU1YQ0K PiAgICAgIzE1IFtmZmZmODgxMDM0M2JmYmQ4XSBfX19fY2FjaGVfYWxsb2Nfbm9kZSBhdCBmZmZm ZmZmZjgxMTVhMmQ5DQo+ICAgICAjMTYgW2ZmZmY4ODEwMzQzYmZjMzhdIGttZW1fY2FjaGVfYWxs b2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiAgICAgIzE3IFtmZmZmODgxMDM0M2JmYzc4XSBza19w cm90X2FsbG9jIGF0IGZmZmZmZmZmODE0MTE4MDgNCj4gICAgICMxOCBbZmZmZjg4MTAzNDNiZmNi OF0gc2tfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTk3Yw0KPiAgICAgIzE5IFtmZmZmODgxMDM0M2Jm Y2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZmZjgxNDgzYmE2DQo+ICAgICAjMjAgW2ZmZmY4ODEw MzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQgZmZmZmZmZmY4MTQwYjRhNw0KPiAgICAgIzIxIFtm ZmZmODgxMDM0M2JmZDk4XSB4c19jcmVhdGVfc29jayBhdCBmZmZmZmZmZmEwMWY2NDliIFtzdW5y cGNdDQo+ICAgICAjMjIgW2ZmZmY4ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1cF9zb2NrZXQgYXQg ZmZmZmZmZmZhMDFmNjk2NSBbc3VucnBjXQ0KPiAgICAgIzIzIFtmZmZmODgxMDM0M2JmZTM4XSB3 b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDANCj4gICAgICMyNCBbZmZmZjg4MTAzNDNi ZmVlOF0ga3RocmVhZCBhdCBmZmZmZmZmZjgxMDhkZDk2DQo+ICAgICAjMjUgW2ZmZmY4ODEwMzQz YmZmNDhdIGtlcm5lbF90aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiANCj4gVGhlIHByb2Js ZW0gaXMgcHJldHR5IGNsZWFyLiBycGNpb2QgaXMgdHJ5aW5nIHRvIGFsbG9jYXRlIG1lbW9yeSBm b3IgYSBuZXcNCj4gc29ja2V0IHRvIHRhbGsgdG8gdGhlIHNlcnZlci4gVGhlIFZNIGVuZHMgdXAg Y2FsbGluZyAtPnJlbGVhc2VwYWdlIHRvIGdldA0KPiBtb3JlIG1lbW9yeSwgYW5kIGl0IHRyaWVz IHRvIGRvIGEgYmxvY2tpbmcgY29tbWl0LiBUaGF0IGNvbW1pdCBjYW4ndA0KPiBzdWNjZWVkIGhv d2V2ZXIgd2l0aG91dCBhIGNvbm5lY3RlZCBzb2NrZXQsIHNvIHdlIGRlYWRsb2NrLg0KPiANCj4g Rml4IHRoaXMgYnkgc2V0dGluZyBQRl9GU1RSQU5TIG9uIHRoZSB3b3JrcXVldWUgdGFzayBwcmlv ciB0byBkb2luZyB0aGUNCj4gc29ja2V0IGFsbG9jYXRpb24sIGFuZCBoYXZpbmcgbmZzX3JlbGVh c2VfcGFnZSBjaGVjayBmb3IgdGhhdCBmbGFnIHdoZW4NCj4gZGVjaWRpbmcgd2hldGhlciB0byBk byBhIGJsb2NraW5nIGNvbW1pdCBjYWxsLg0KPiANCj4gQWxzbywgY2MnaW5nIE1lbCBzaW5jZSBJ IGJvcnJvd2VkIGhpcyB0c2tfcmVzdG9yZV9mbGFncyBoZWxwZXIgaGVyZSwgYW5kDQo+IHRoaXMg cGF0Y2ggbWF5IGNhdXNlIG1lcmdlIGNvbmZsaWN0cyB3aXRoIHNvbWUgb2YgaGlzIFN3YXAtb3Zl ci1ORlMNCj4gcGF0Y2hlcy4NCj4gDQo+IENjOiBNZWwgR29ybWFuIDxtZ29ybWFuQHN1c2UuZGU+ DQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhhdC5jb20+DQo+IC0t LQ0KPiAgZnMvbmZzL2ZpbGUuYyAgICAgICAgICAgICAgICAgICB8ICAgIDggKysrKysrLS0NCj4g IGluY2x1ZGUvbGludXgvc2NoZWQuaCAgICAgICAgICAgfCAgICA3ICsrKysrKysNCj4gIG5ldC9z dW5ycGMveHBydHJkbWEvdHJhbnNwb3J0LmMgfCAgICA0ICsrKy0NCj4gIG5ldC9zdW5ycGMveHBy dHNvY2suYyAgICAgICAgICAgfCAgIDEzICsrKysrKysrKysrKysNCj4gIDQgZmlsZXMgY2hhbmdl ZCwgMjkgaW5zZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9m cy9uZnMvZmlsZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiBpbmRleCBhNjcwOGU2Yi4uODBjYzYyMSAx MDA2NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmlsZS5jDQo+IEBA IC00NjMsOCArNDYzLDEyIEBAIHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShzdHJ1Y3QgcGFn ZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiAgCWlmIChtYXBwaW5nICYmIChnZnAgJiBHRlBfS0VSTkVM KSA9PSBHRlBfS0VSTkVMKSB7DQo+ICAJCWludCBob3cgPSBGTFVTSF9TWU5DOw0KPiAgDQo+IC0J CS8qIERvbid0IGxldCBrc3dhcGQgZGVhZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQQyBjYWxscyAq Lw0KPiAtCQlpZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gKwkJLyoNCj4gKwkJICogRG9uJ3Qg bGV0IGtzd2FwZCBkZWFkbG9jayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzLCBhbmQNCj4gKwkJ ICogZG9uJ3QgcmVjdXJzZSBiYWNrIGludG8gdGhlIGZzIGlmIHdlIGtub3cgdGhhdCB3ZSdyZSB0 cnlpbmcNCj4gKwkJICogdG8gcmVjbGFpbSBtZW1vcnkgZm9yIGFuIGZzLXJlbGF0ZWQgYWxsb2Nh dGlvbi4NCj4gKwkJICovDQo+ICsJCWlmIChjdXJyZW50X2lzX2tzd2FwZCgpIHx8IGN1cnJlbnQt PmZsYWdzICYgUEZfRlNUUkFOUykNCj4gIAkJCWhvdyA9IDA7DQo+ICAJCW5mc19jb21taXRfaW5v ZGUobWFwcGluZy0+aG9zdCwgaG93KTsNCj4gIAl9DQo+IGRpZmYgLS1naXQgYS9pbmNsdWRlL2xp bnV4L3NjaGVkLmggYi9pbmNsdWRlL2xpbnV4L3NjaGVkLmgNCj4gaW5kZXggNDA1OWMwZi4uZDcx YjUyMyAxMDA2NDQNCj4gLS0tIGEvaW5jbHVkZS9saW51eC9zY2hlZC5oDQo+ICsrKyBiL2luY2x1 ZGUvbGludXgvc2NoZWQuaA0KPiBAQCAtMTg4OSw2ICsxODg5LDEzIEBAIHN0YXRpYyBpbmxpbmUg dm9pZCByY3Vfc3dpdGNoX2Zyb20oc3RydWN0IHRhc2tfc3RydWN0ICpwcmV2KQ0KPiAgDQo+ICAj ZW5kaWYNCj4gIA0KPiArc3RhdGljIGlubGluZSB2b2lkIHRza19yZXN0b3JlX2ZsYWdzKHN0cnVj dCB0YXNrX3N0cnVjdCAqcCwNCj4gKwkJCXVuc2lnbmVkIGxvbmcgcGZsYWdzLCB1bnNpZ25lZCBs b25nIG1hc2spDQo+ICt7DQo+ICsJcC0+ZmxhZ3MgJj0gfm1hc2s7DQo+ICsJcC0+ZmxhZ3MgfD0g cGZsYWdzICYgbWFzazsNCj4gK30NCj4gKw0KPiAgI2lmZGVmIENPTkZJR19TTVANCj4gIGV4dGVy biB2b2lkIGRvX3NldF9jcHVzX2FsbG93ZWQoc3RydWN0IHRhc2tfc3RydWN0ICpwLA0KPiAgCQkJ ICAgICAgIGNvbnN0IHN0cnVjdCBjcHVtYXNrICpuZXdfbWFzayk7DQo+IGRpZmYgLS1naXQgYS9u ZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jIGIvbmV0L3N1bnJwYy94cHJ0cmRtYS90cmFu c3BvcnQuYw0KPiBpbmRleCBiNDQ2ZTEwLi43YjBmNzRiIDEwMDY0NA0KPiAtLS0gYS9uZXQvc3Vu cnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jDQo+ICsrKyBiL25ldC9zdW5ycGMveHBydHJkbWEvdHJh bnNwb3J0LmMNCj4gQEAgLTE5Nyw5ICsxOTcsMTEgQEAgeHBydF9yZG1hX2Nvbm5lY3Rfd29ya2Vy KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIAlzdHJ1Y3QgcnBjcmRtYV94cHJ0ICpyX3hw cnQgPQ0KPiAgCQljb250YWluZXJfb2Yod29yaywgc3RydWN0IHJwY3JkbWFfeHBydCwgcmRtYV9j b25uZWN0LndvcmspOw0KPiAgCXN0cnVjdCBycGNfeHBydCAqeHBydCA9ICZyX3hwcnQtPnhwcnQ7 DQo+ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlpbnQgcmMg PSAwOw0KPiAgDQo+ICAJaWYgKCF4cHJ0LT5zaHV0ZG93bikgew0KPiArCQljdXJyZW50LT5mbGFn cyB8PSBQRl9GU1RSQU5TOw0KPiAgCQl4cHJ0X2NsZWFyX2Nvbm5lY3RlZCh4cHJ0KTsNCj4gIA0K PiAgCQlkcHJpbnRrKCJSUEM6ICAgICAgICVzOiAlc2Nvbm5lY3RcbiIsIF9fZnVuY19fLA0KPiBA QCAtMjEyLDEwICsyMTQsMTAgQEAgeHBydF9yZG1hX2Nvbm5lY3Rfd29ya2VyKHN0cnVjdCB3b3Jr X3N0cnVjdCAqd29yaykNCj4gIA0KPiAgb3V0Og0KPiAgCXhwcnRfd2FrZV9wZW5kaW5nX3Rhc2tz KHhwcnQsIHJjKTsNCj4gLQ0KPiAgb3V0X2NsZWFyOg0KPiAgCWRwcmludGsoIlJQQzogICAgICAg JXM6IGV4aXRcbiIsIF9fZnVuY19fKTsNCj4gIAl4cHJ0X2NsZWFyX2Nvbm5lY3RpbmcoeHBydCk7 DQo+ICsJdHNrX3Jlc3RvcmVfZmxhZ3MoY3VycmVudCwgcGZsYWdzLCBQRl9GU1RSQU5TKTsNCj4g IH0NCj4gIA0KPiAgLyoNCj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydHNvY2suYyBiL25l dC9zdW5ycGMveHBydHNvY2suYw0KPiBpbmRleCA4OTBiMDNmLi42YzdkOGQ1IDEwMDY0NA0KPiAt LS0gYS9uZXQvc3VucnBjL3hwcnRzb2NrLmMNCj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0c29jay5j DQo+IEBAIC0xODkwLDExICsxODkwLDE0IEBAIHN0YXRpYyB2b2lkIHhzX2xvY2FsX3NldHVwX3Nv Y2tldChzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICAJCWNvbnRhaW5lcl9vZih3b3JrLCBz dHJ1Y3Qgc29ja194cHJ0LCBjb25uZWN0X3dvcmtlci53b3JrKTsNCj4gIAlzdHJ1Y3QgcnBjX3hw cnQgKnhwcnQgPSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2s7DQo+ ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlpbnQgc3RhdHVz ID0gLUVJTzsNCj4gIA0KPiAgCWlmICh4cHJ0LT5zaHV0ZG93bikNCj4gIAkJZ290byBvdXQ7DQo+ ICANCj4gKwljdXJyZW50LT5mbGFncyB8PSBQRl9GU1RSQU5TOw0KPiArDQo+ICAJY2xlYXJfYml0 KFhQUlRfQ09OTkVDVElPTl9BQk9SVCwgJnhwcnQtPnN0YXRlKTsNCj4gIAlzdGF0dXMgPSBfX3Nv Y2tfY3JlYXRlKHhwcnQtPnhwcnRfbmV0LCBBRl9MT0NBTCwNCj4gIAkJCQkJU09DS19TVFJFQU0s IDAsICZzb2NrLCAxKTsNCj4gQEAgLTE5MjgsNiArMTkzMSw3IEBAIHN0YXRpYyB2b2lkIHhzX2xv Y2FsX3NldHVwX3NvY2tldChzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICBvdXQ6DQo+ICAJ eHBydF9jbGVhcl9jb25uZWN0aW5nKHhwcnQpOw0KPiAgCXhwcnRfd2FrZV9wZW5kaW5nX3Rhc2tz KHhwcnQsIHN0YXR1cyk7DQo+ICsJdHNrX3Jlc3RvcmVfZmxhZ3MoY3VycmVudCwgcGZsYWdzLCBQ Rl9GU1RSQU5TKTsNCj4gIH0NCj4gIA0KPiAgc3RhdGljIHZvaWQgeHNfdWRwX2ZpbmlzaF9jb25u ZWN0aW5nKHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHNvY2tldCAqc29jaykNCj4gQEAg LTE5NjUsMTEgKzE5NjksMTQgQEAgc3RhdGljIHZvaWQgeHNfdWRwX3NldHVwX3NvY2tldChzdHJ1 Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICAJCWNvbnRhaW5lcl9vZih3b3JrLCBzdHJ1Y3Qgc29j a194cHJ0LCBjb25uZWN0X3dvcmtlci53b3JrKTsNCj4gIAlzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQg PSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2sgPSB0cmFuc3BvcnQt PnNvY2s7DQo+ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlp bnQgc3RhdHVzID0gLUVJTzsNCj4gIA0KPiAgCWlmICh4cHJ0LT5zaHV0ZG93bikNCj4gIAkJZ290 byBvdXQ7DQo+ICANCj4gKwljdXJyZW50LT5mbGFncyB8PSBQRl9GU1RSQU5TOw0KPiArDQo+ICAJ LyogU3RhcnQgYnkgcmVzZXR0aW5nIGFueSBleGlzdGluZyBzdGF0ZSAqLw0KPiAgCXhzX3Jlc2V0 X3RyYW5zcG9ydCh0cmFuc3BvcnQpOw0KPiAgCXNvY2sgPSB4c19jcmVhdGVfc29jayh4cHJ0LCB0 cmFuc3BvcnQsDQo+IEBAIC0xOTg4LDYgKzE5OTUsNyBAQCBzdGF0aWMgdm9pZCB4c191ZHBfc2V0 dXBfc29ja2V0KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIG91dDoNCj4gIAl4cHJ0X2Ns ZWFyX2Nvbm5lY3RpbmcoeHBydCk7DQo+ICAJeHBydF93YWtlX3BlbmRpbmdfdGFza3MoeHBydCwg c3RhdHVzKTsNCj4gKwl0c2tfcmVzdG9yZV9mbGFncyhjdXJyZW50LCBwZmxhZ3MsIFBGX0ZTVFJB TlMpOw0KPiAgfQ0KPiAgDQo+ICAvKg0KPiBAQCAtMjEwOCwxMSArMjExNiwxNCBAQCBzdGF0aWMg dm9pZCB4c190Y3Bfc2V0dXBfc29ja2V0KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIAkJ Y29udGFpbmVyX29mKHdvcmssIHN0cnVjdCBzb2NrX3hwcnQsIGNvbm5lY3Rfd29ya2VyLndvcmsp Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2sgPSB0cmFuc3BvcnQtPnNvY2s7DQo+ICAJc3RydWN0 IHJwY194cHJ0ICp4cHJ0ID0gJnRyYW5zcG9ydC0+eHBydDsNCj4gKwl1bnNpZ25lZCBsb25nIHBm bGFncyA9IGN1cnJlbnQtPmZsYWdzOw0KPiAgCWludCBzdGF0dXMgPSAtRUlPOw0KPiAgDQo+ICAJ aWYgKHhwcnQtPnNodXRkb3duKQ0KPiAgCQlnb3RvIG91dDsNCj4gIA0KPiArCWN1cnJlbnQtPmZs YWdzIHw9IFBGX0ZTVFJBTlM7DQo+ICsNCj4gIAlpZiAoIXNvY2spIHsNCj4gIAkJY2xlYXJfYml0 KFhQUlRfQ09OTkVDVElPTl9BQk9SVCwgJnhwcnQtPnN0YXRlKTsNCj4gIAkJc29jayA9IHhzX2Ny ZWF0ZV9zb2NrKHhwcnQsIHRyYW5zcG9ydCwNCj4gQEAgLTIxNjIsNiArMjE3Myw3IEBAIHN0YXRp YyB2b2lkIHhzX3RjcF9zZXR1cF9zb2NrZXQoc3RydWN0IHdvcmtfc3RydWN0ICp3b3JrKQ0KPiAg CWNhc2UgLUVJTlBST0dSRVNTOg0KPiAgCWNhc2UgLUVBTFJFQURZOg0KPiAgCQl4cHJ0X2NsZWFy X2Nvbm5lY3RpbmcoeHBydCk7DQo+ICsJCXRza19yZXN0b3JlX2ZsYWdzKGN1cnJlbnQsIHBmbGFn cywgUEZfRlNUUkFOUyk7DQo+ICAJCXJldHVybjsNCj4gIAljYXNlIC1FSU5WQUw6DQo+ICAJCS8q IEhhcHBlbnMsIGZvciBpbnN0YW5jZSwgaWYgdGhlIHVzZXIgc3BlY2lmaWVkIGEgbGluaw0KPiBA QCAtMjE3NCw2ICsyMTg2LDcgQEAgb3V0X2VhZ2FpbjoNCj4gIG91dDoNCj4gIAl4cHJ0X2NsZWFy X2Nvbm5lY3RpbmcoeHBydCk7DQo+ICAJeHBydF93YWtlX3BlbmRpbmdfdGFza3MoeHBydCwgc3Rh dHVzKTsNCj4gKwl0c2tfcmVzdG9yZV9mbGFncyhjdXJyZW50LCBwZmxhZ3MsIFBGX0ZTVFJBTlMp Ow0KPiAgfQ0KPiAgDQo+ICAvKioNCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBj bGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3 d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete 2012-06-28 13:40 ` Myklebust, Trond @ 2012-06-28 14:47 ` Jeff Layton 2012-06-28 14:51 ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton 1 sibling, 0 replies; 23+ messages in thread From: Jeff Layton @ 2012-06-28 14:47 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman On Thu, 28 Jun 2012 13:40:37 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-06-28 at 09:19 -0400, Jeff Layton wrote: > > On Wed, 27 Jun 2012 20:46:49 +0000 > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > On Wed, 2012-06-27 at 16:19 -0400, Jeff Layton wrote: > > > > On Wed, 27 Jun 2012 19:56:38 +0000 > > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > > > > > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote: > > > > > > On Wed, 27 Jun 2012 18:43:56 +0000 > > > > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > > > > > If there really is no alternative to freeing the socket, then the only > > > > > > real fix I can see is to set PF_MEMALLOC when we go to create it and > > > > > > then reset it afterward. That's a pretty ugly fix though... > > > > > > > > > > I can think of 2 possible alternatives: > > > > > > > > > > 1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a > > > > > direct reclaim from rpciod. > > > > > > > > Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses > > > > that flag now... > > > > > > > > Suppose we're in some XFS allocation and then want to commit a NFS > > > > page to satisfy it. There's no real reason we couldn't but that would > > > > prevent it... > > > > > > So do all the GFP_NOFS allocations. How is this any different? > > > > > > > Good point. Now that I've looked closer, I think this is probably the > > best approach. Proposed patch below. It builds but is otherwise > > untested so far... > > > > > > > 2. Add a 'congested' flag to the rpc client that could tell reclaim > > > > > type operations to give up if a socket allocation is in > > > > > progress. > > > > > > > > > > > > > That's not a bad idea. That has the benefit of only skipping reclaim > > > > when it's directly associated with the socket being reconnected. Seems > > > > like that would have to hang off of the rpc_xprt, but we could do > > > > something along those lines. Presumably Mel's swap-over-nfs patches > > > > could be converted to do something similar instead of using PF_MEMALLOC > > > > there too. > > > > > > > The problem with the above scheme is that it would be racy. It would > > easily be possible for the congested flag to flip just after you > > checked it but before issuing the commit. > > > > Maybe we could push handling for that down into the commit call itself > > -- allow it to give up if "congested" goes true while we're blocked and > > waiting for the the reply, but that sounds complicated... > > > > Anyway, here's a proposed patch for the PF_FSTRANS scheme. It builds > > but I haven't otherwise tested it yet. For now, it just covers the > > socket allocation and ->releasepage. Eventually I guess we'll probably > > want to set and check for PF_FSTRANS in other places: > > Is there any reason why we might not want all rpciod work to run under > PF_FSTRANS? IOW: couldn't we just set/clear it unconditionally when > entering/exiting rpc_async_schedule(), and xs_*_setup_socket()? > > We shouldn't need to worry about the remaining rpciod work structures in > rpc_timeout_upcall_queue and xprt_autoclose, since those don't ever need > to do any allocations. > I wasn't sure that that was really necessary. Just because we have rpciod sleeping and waiting for memory to be reclaimed doesn't necessarily mean that other RPCs can't go through. IIUC, the CMWQ implementation will spawn a new execution context as needed, even on a UP box, because we have WQ_MEM_RECLAIM set. The socket reconnect is a bit of a special case since that *has* to be done before anything can happen. Still, I suppose there is a possible corner case here. If we're blocked while in the middle of trying to commit, then the releasepage's commit might end up stuck behind the commit_lock. I'll send a revised patch separately... Thanks for the help so far... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket 2012-06-28 13:40 ` Myklebust, Trond 2012-06-28 14:47 ` Jeff Layton @ 2012-06-28 14:51 ` Jeff Layton 2012-06-28 15:02 ` Myklebust, Trond 1 sibling, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-28 14:51 UTC (permalink / raw) To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca The problem is pretty clear. rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a blocking commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Also, cc'ing Mel since I borrowed his tsk_restore_flags helper here, and this patch may cause merge conflicts with some of his Swap-over-NFS patches. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/file.c | 8 ++++++-- include/linux/sched.h | 7 +++++++ net/sunrpc/sched.c | 4 ++++ net/sunrpc/xprtrdma/transport.c | 4 +++- net/sunrpc/xprtsock.c | 13 +++++++++++++ 5 files changed, 33 insertions(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a6708e6b..80cc621 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -463,8 +463,12 @@ static int nfs_release_page(struct page *page, gfp_t gfp) if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { int how = FLUSH_SYNC; - /* Don't let kswapd deadlock waiting for OOM RPC calls */ - if (current_is_kswapd()) + /* + * Don't let kswapd deadlock waiting for OOM RPC calls, and + * don't recurse back into the fs if we know that we're trying + * to reclaim memory for an fs-related allocation. + */ + if (current_is_kswapd() || current->flags & PF_FSTRANS) how = 0; nfs_commit_inode(mapping->host, how); } diff --git a/include/linux/sched.h b/include/linux/sched.h index 4059c0f..d71b523 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1889,6 +1889,13 @@ static inline void rcu_switch_from(struct task_struct *prev) #endif +static inline void tsk_restore_flags(struct task_struct *p, + unsigned long pflags, unsigned long mask) +{ + p->flags &= ~mask; + p->flags |= pflags & mask; +} + #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 994cfea..5c10ae6 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -790,7 +790,11 @@ void rpc_execute(struct rpc_task *task) static void rpc_async_schedule(struct work_struct *work) { + unsigned int pflags = current->flags; + + current->flags |= PF_FSTRANS; __rpc_execute(container_of(work, struct rpc_task, u.tk_work)); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /** diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index b446e10..7b0f74b 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -197,9 +197,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rdma_connect.work); struct rpc_xprt *xprt = &r_xprt->xprt; + unsigned long pflags = current->flags; int rc = 0; if (!xprt->shutdown) { + current->flags |= PF_FSTRANS; xprt_clear_connected(xprt); dprintk("RPC: %s: %sconnect\n", __func__, @@ -212,10 +214,10 @@ xprt_rdma_connect_worker(struct work_struct *work) out: xprt_wake_pending_tasks(xprt, rc); - out_clear: dprintk("RPC: %s: exit\n", __func__); xprt_clear_connecting(xprt); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /* diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 890b03f..6c7d8d5 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1890,11 +1890,14 @@ static void xs_local_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); status = __sock_create(xprt->xprt_net, AF_LOCAL, SOCK_STREAM, 0, &sock, 1); @@ -1928,6 +1931,7 @@ static void xs_local_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) @@ -1965,11 +1969,14 @@ static void xs_udp_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct rpc_xprt *xprt = &transport->xprt; struct socket *sock = transport->sock; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + /* Start by resetting any existing state */ xs_reset_transport(transport); sock = xs_create_sock(xprt, transport, @@ -1988,6 +1995,7 @@ static void xs_udp_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /* @@ -2108,11 +2116,14 @@ static void xs_tcp_setup_socket(struct work_struct *work) container_of(work, struct sock_xprt, connect_worker.work); struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; + unsigned long pflags = current->flags; int status = -EIO; if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + if (!sock) { clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); sock = xs_create_sock(xprt, transport, @@ -2162,6 +2173,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -EINPROGRESS: case -EALREADY: xprt_clear_connecting(xprt); + tsk_restore_flags(current, pflags, PF_FSTRANS); return; case -EINVAL: /* Happens, for instance, if the user specified a link @@ -2174,6 +2186,7 @@ out_eagain: out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + tsk_restore_flags(current, pflags, PF_FSTRANS); } /** -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket 2012-06-28 14:51 ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton @ 2012-06-28 15:02 ` Myklebust, Trond 2012-06-28 15:25 ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-06-28 15:02 UTC (permalink / raw) To: Jeff Layton; +Cc: harshula, linux-nfs, mgorman T24gVGh1LCAyMDEyLTA2LTI4IGF0IDEwOjUxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g V2UndmUgaGFkIHNvbWUgcmVwb3J0cyBvZiBhIGRlYWRsb2NrIHdoZXJlIHJwY2lvZCBlbmRzIHVw IHdpdGggYSBzdGFjaw0KPiB0cmFjZSBsaWtlIHRoaXM6DQo+IA0KPiAgICAgUElEOiAyNTA3ICAg VEFTSzogZmZmZjg4MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCINCj4g ICAgICAjMCBbZmZmZjg4MTAzNDNiZjJmMF0gc2NoZWR1bGUgYXQgZmZmZmZmZmY4MTRkYWJkOQ0K PiAgICAgICMxIFtmZmZmODgxMDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZm ZmZmZmZhMDM4ZmMwNCBbbmZzXQ0KPiAgICAgICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRf b25fYml0IGF0IGZmZmZmZmZmODE0ZGJjMmYNCj4gICAgICAjMyBbZmZmZjg4MTAzNDNiZjQxOF0g b3V0X29mX2xpbmVfd2FpdF9vbl9iaXQgYXQgZmZmZmZmZmY4MTRkYmNkOA0KPiAgICAgICM0IFtm ZmZmODgxMDM0M2JmNDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25m c10NCj4gICAgICAjNSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZm ZmZmZmEwMzhiZWY2IFtuZnNdDQo+ICAgICAgIzYgW2ZmZmY4ODEwMzQzYmY1MjhdIHRyeV90b19y ZWxlYXNlX3BhZ2UgYXQgZmZmZmZmZmY4MTEwYzY3MA0KPiAgICAgICM3IFtmZmZmODgxMDM0M2Jm NTM4XSBzaHJpbmtfcGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiAgICAg ICM4IFtmZmZmODgxMDM0M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgx MTI2NjM4DQo+ICAgICAgIzkgW2ZmZmY4ODEwMzQzYmY4MThdIHNocmlua196b25lIGF0IGZmZmZm ZmZmODExMjc4OGYNCj4gICAgICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVf cGFnZXMgYXQgZmZmZmZmZmY4MTEyN2IxZQ0KPiAgICAgIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0 cnlfdG9fZnJlZV9wYWdlcyBhdCBmZmZmZmZmZjgxMTI4MTJmDQo+ICAgICAjMTIgW2ZmZmY4ODEw MzQzYmZhMDhdIF9fYWxsb2NfcGFnZXNfbm9kZW1hc2sgYXQgZmZmZmZmZmY4MTExZmRhZA0KPiAg ICAgIzEzIFtmZmZmODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZmODExNTk5 NDINCj4gICAgICMxNCBbZmZmZjg4MTAzNDNiZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZm ZmY4MTE1YTU1YQ0KPiAgICAgIzE1IFtmZmZmODgxMDM0M2JmYmQ4XSBfX19fY2FjaGVfYWxsb2Nf bm9kZSBhdCBmZmZmZmZmZjgxMTVhMmQ5DQo+ICAgICAjMTYgW2ZmZmY4ODEwMzQzYmZjMzhdIGtt ZW1fY2FjaGVfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiAgICAgIzE3IFtmZmZmODgxMDM0 M2JmYzc4XSBza19wcm90X2FsbG9jIGF0IGZmZmZmZmZmODE0MTE4MDgNCj4gICAgICMxOCBbZmZm Zjg4MTAzNDNiZmNiOF0gc2tfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTk3Yw0KPiAgICAgIzE5IFtm ZmZmODgxMDM0M2JmY2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZmZjgxNDgzYmE2DQo+ICAgICAj MjAgW2ZmZmY4ODEwMzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQgZmZmZmZmZmY4MTQwYjRhNw0K PiAgICAgIzIxIFtmZmZmODgxMDM0M2JmZDk4XSB4c19jcmVhdGVfc29jayBhdCBmZmZmZmZmZmEw MWY2NDliIFtzdW5ycGNdDQo+ICAgICAjMjIgW2ZmZmY4ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1 cF9zb2NrZXQgYXQgZmZmZmZmZmZhMDFmNjk2NSBbc3VucnBjXQ0KPiAgICAgIzIzIFtmZmZmODgx MDM0M2JmZTM4XSB3b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDANCj4gICAgICMyNCBb ZmZmZjg4MTAzNDNiZmVlOF0ga3RocmVhZCBhdCBmZmZmZmZmZjgxMDhkZDk2DQo+ICAgICAjMjUg W2ZmZmY4ODEwMzQzYmZmNDhdIGtlcm5lbF90aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiAN Cj4gVGhlIHByb2JsZW0gaXMgcHJldHR5IGNsZWFyLiBycGNpb2QgaXMgdHJ5aW5nIHRvIGFsbG9j YXRlIG1lbW9yeSBmb3IgYSBuZXcNCj4gc29ja2V0IHRvIHRhbGsgdG8gdGhlIHNlcnZlci4gVGhl IFZNIGVuZHMgdXAgY2FsbGluZyAtPnJlbGVhc2VwYWdlIHRvIGdldA0KPiBtb3JlIG1lbW9yeSwg YW5kIGl0IHRyaWVzIHRvIGRvIGEgYmxvY2tpbmcgY29tbWl0LiBUaGF0IGNvbW1pdCBjYW4ndA0K PiBzdWNjZWVkIGhvd2V2ZXIgd2l0aG91dCBhIGNvbm5lY3RlZCBzb2NrZXQsIHNvIHdlIGRlYWRs b2NrLg0KPiANCj4gRml4IHRoaXMgYnkgc2V0dGluZyBQRl9GU1RSQU5TIG9uIHRoZSB3b3JrcXVl dWUgdGFzayBwcmlvciB0byBkb2luZyB0aGUNCj4gc29ja2V0IGFsbG9jYXRpb24sIGFuZCBoYXZp bmcgbmZzX3JlbGVhc2VfcGFnZSBjaGVjayBmb3IgdGhhdCBmbGFnIHdoZW4NCj4gZGVjaWRpbmcg d2hldGhlciB0byBkbyBhIGJsb2NraW5nIGNvbW1pdCBjYWxsLiBBbHNvLCBzZXQgUEZfRlNUUkFO Uw0KPiB1bmNvbmRpdGlvbmFsbHkgaW4gcnBjX2FzeW5jX3NjaGVkdWxlIHNpbmNlIHRoYXQgZnVu Y3Rpb24gY2FuIGFsc28gZG8NCj4gYWxsb2NhdGlvbnMgc29tZXRpbWVzLg0KPiANCj4gQWxzbywg Y2MnaW5nIE1lbCBzaW5jZSBJIGJvcnJvd2VkIGhpcyB0c2tfcmVzdG9yZV9mbGFncyBoZWxwZXIg aGVyZSwgYW5kDQo+IHRoaXMgcGF0Y2ggbWF5IGNhdXNlIG1lcmdlIGNvbmZsaWN0cyB3aXRoIHNv bWUgb2YgaGlzIFN3YXAtb3Zlci1ORlMNCj4gcGF0Y2hlcy4NCj4gDQo+IENjOiBNZWwgR29ybWFu IDxtZ29ybWFuQHN1c2UuZGU+DQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9u QHJlZGhhdC5jb20+DQo+IC0tLQ0KPiAgZnMvbmZzL2ZpbGUuYyAgICAgICAgICAgICAgICAgICB8 ICAgIDggKysrKysrLS0NCj4gIGluY2x1ZGUvbGludXgvc2NoZWQuaCAgICAgICAgICAgfCAgICA3 ICsrKysrKysNCj4gIG5ldC9zdW5ycGMvc2NoZWQuYyAgICAgICAgICAgICAgfCAgICA0ICsrKysN Cj4gIG5ldC9zdW5ycGMveHBydHJkbWEvdHJhbnNwb3J0LmMgfCAgICA0ICsrKy0NCj4gIG5ldC9z dW5ycGMveHBydHNvY2suYyAgICAgICAgICAgfCAgIDEzICsrKysrKysrKysrKysNCj4gIDUgZmls ZXMgY2hhbmdlZCwgMzMgaW5zZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYg LS1naXQgYS9mcy9uZnMvZmlsZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiBpbmRleCBhNjcwOGU2Yi4u ODBjYzYyMSAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmls ZS5jDQo+IEBAIC00NjMsOCArNDYzLDEyIEBAIHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShz dHJ1Y3QgcGFnZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiAgCWlmIChtYXBwaW5nICYmIChnZnAgJiBH RlBfS0VSTkVMKSA9PSBHRlBfS0VSTkVMKSB7DQo+ICAJCWludCBob3cgPSBGTFVTSF9TWU5DOw0K PiAgDQo+IC0JCS8qIERvbid0IGxldCBrc3dhcGQgZGVhZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQ QyBjYWxscyAqLw0KPiAtCQlpZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gKwkJLyoNCj4gKwkJ ICogRG9uJ3QgbGV0IGtzd2FwZCBkZWFkbG9jayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzLCBh bmQNCj4gKwkJICogZG9uJ3QgcmVjdXJzZSBiYWNrIGludG8gdGhlIGZzIGlmIHdlIGtub3cgdGhh dCB3ZSdyZSB0cnlpbmcNCj4gKwkJICogdG8gcmVjbGFpbSBtZW1vcnkgZm9yIGFuIGZzLXJlbGF0 ZWQgYWxsb2NhdGlvbi4NCj4gKwkJICovDQo+ICsJCWlmIChjdXJyZW50X2lzX2tzd2FwZCgpIHx8 IGN1cnJlbnQtPmZsYWdzICYgUEZfRlNUUkFOUykNCj4gIAkJCWhvdyA9IDA7DQo+ICAJCW5mc19j b21taXRfaW5vZGUobWFwcGluZy0+aG9zdCwgaG93KTsNCg0KVGhpcyB3aWxsIHN0aWxsIHRyeSB0 byBkbyBhIGRpcmVjdCByZWNsYWltLiBFdmVuIGlmIHdlIG1ha2UgdGhhdCByZWNsYWltDQpub24t YmxvY2tpbmcsIEknbSB0aGlua2luZyB0aGF0IHdlIHNob3VsZCBwZXJoYXBzIGp1c3Qgc2tpcCB0 aGUNCm5mc19jb21taXRfaW5vZGUoKSBpbiB0aGUgY2FzZSB3aGVyZSBQRl9GU1RSQU5TIGlzIHNl dDogaWYgd2UncmUgbG93IG9uDQptZW1vcnksIHRoZW4gaXQgaXMgYmV0dGVyIHRvIHNraXAgdGhl IG5mc19jb21taXRfaW5vZGUgcmF0aGVyIHRoYW4gdGllDQp1cCBtb3JlIHJlc291cmNlcyBpbiBh biBhdHRlbXB0IHRvIGZyZWUgdXAgbWVtb3J5IHRoYXQgd2UgX2tub3dfIGlzDQpnb2luZyB0byBi ZSBpbiB2YWluLg0KDQo+ICAJfQ0KPiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9zY2hlZC5o IGIvaW5jbHVkZS9saW51eC9zY2hlZC5oDQo+IGluZGV4IDQwNTljMGYuLmQ3MWI1MjMgMTAwNjQ0 DQo+IC0tLSBhL2luY2x1ZGUvbGludXgvc2NoZWQuaA0KPiArKysgYi9pbmNsdWRlL2xpbnV4L3Nj aGVkLmgNCj4gQEAgLTE4ODksNiArMTg4OSwxMyBAQCBzdGF0aWMgaW5saW5lIHZvaWQgcmN1X3N3 aXRjaF9mcm9tKHN0cnVjdCB0YXNrX3N0cnVjdCAqcHJldikNCj4gIA0KPiAgI2VuZGlmDQo+ICAN Cj4gK3N0YXRpYyBpbmxpbmUgdm9pZCB0c2tfcmVzdG9yZV9mbGFncyhzdHJ1Y3QgdGFza19zdHJ1 Y3QgKnAsDQo+ICsJCQl1bnNpZ25lZCBsb25nIHBmbGFncywgdW5zaWduZWQgbG9uZyBtYXNrKQ0K PiArew0KPiArCXAtPmZsYWdzICY9IH5tYXNrOw0KPiArCXAtPmZsYWdzIHw9IHBmbGFncyAmIG1h c2s7DQo+ICt9DQoNCkRvIHdlIG5lZWQgdGhpcz8gcnBjaW9kIGlzIGEgd29ya3F1ZXVlIHRocmVh ZCwgc28gaW4gcHJpbmNpcGxlIHdlDQphbHJlYWR5IF9rbm93XyB0aGF0IFBGX0ZTVFJBTlMgaXMg bm90IGdvaW5nIHRvIGJlIHNldC4NCg0KDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBO RlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNv bQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons 2012-06-28 15:02 ` Myklebust, Trond @ 2012-06-28 15:25 ` Jeff Layton 2012-07-02 15:17 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-06-28 15:25 UTC (permalink / raw) To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/file.c | 7 +++++-- net/sunrpc/sched.c | 2 ++ net/sunrpc/xprtrdma/transport.c | 3 ++- net/sunrpc/xprtsock.c | 10 ++++++++++ 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a6708e6b..9075769 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -459,8 +459,11 @@ static int nfs_release_page(struct page *page, gfp_t gfp) dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); - /* Only do I/O if gfp is a superset of GFP_KERNEL */ - if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { + /* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not + * doing this memory reclaim for a fs-related allocation. + */ + if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL && + !(current->flags & PF_FSTRANS)) { int how = FLUSH_SYNC; /* Don't let kswapd deadlock waiting for OOM RPC calls */ diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 994cfea..eda32ae 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -790,7 +790,9 @@ void rpc_execute(struct rpc_task *task) static void rpc_async_schedule(struct work_struct *work) { + current->flags |= PF_FSTRANS; __rpc_execute(container_of(work, struct rpc_task, u.tk_work)); + current->flags &= ~PF_FSTRANS; } /** diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index b446e10..06cdbff 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -200,6 +200,7 @@ xprt_rdma_connect_worker(struct work_struct *work) int rc = 0; if (!xprt->shutdown) { + current->flags |= PF_FSTRANS; xprt_clear_connected(xprt); dprintk("RPC: %s: %sconnect\n", __func__, @@ -212,10 +213,10 @@ xprt_rdma_connect_worker(struct work_struct *work) out: xprt_wake_pending_tasks(xprt, rc); - out_clear: dprintk("RPC: %s: exit\n", __func__); xprt_clear_connecting(xprt); + current->flags &= ~PF_FSTRANS; } /* diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 890b03f..b88c6bf 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1895,6 +1895,8 @@ static void xs_local_setup_socket(struct work_struct *work) if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); status = __sock_create(xprt->xprt_net, AF_LOCAL, SOCK_STREAM, 0, &sock, 1); @@ -1928,6 +1930,7 @@ static void xs_local_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + current->flags &= ~PF_FSTRANS; } static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) @@ -1970,6 +1973,8 @@ static void xs_udp_setup_socket(struct work_struct *work) if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + /* Start by resetting any existing state */ xs_reset_transport(transport); sock = xs_create_sock(xprt, transport, @@ -1988,6 +1993,7 @@ static void xs_udp_setup_socket(struct work_struct *work) out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + current->flags &= ~PF_FSTRANS; } /* @@ -2113,6 +2119,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) if (xprt->shutdown) goto out; + current->flags |= PF_FSTRANS; + if (!sock) { clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); sock = xs_create_sock(xprt, transport, @@ -2162,6 +2170,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -EINPROGRESS: case -EALREADY: xprt_clear_connecting(xprt); + current->flags &= ~PF_FSTRANS; return; case -EINVAL: /* Happens, for instance, if the user specified a link @@ -2174,6 +2183,7 @@ out_eagain: out: xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, status); + current->flags &= ~PF_FSTRANS; } /** -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons 2012-06-28 15:25 ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton @ 2012-07-02 15:17 ` Jeff Layton 2012-07-02 19:40 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Jeff Layton @ 2012-07-02 15:17 UTC (permalink / raw) To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil On Thu, 28 Jun 2012 11:25:13 -0400 Jeff Layton <jlayton@redhat.com> wrote: > We've had some reports of a deadlock where rpciod ends up with a stack > trace like this: > > PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" > #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 > #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] > #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f > #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 > #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] > #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] > #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 > #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 > #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 > #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca > > rpciod is trying to allocate memory for a new socket to talk to the > server. The VM ends up calling ->releasepage to get more memory, and it > tries to do a blocking commit. That commit can't succeed however without > a connected socket, so we deadlock. > > Fix this by setting PF_FSTRANS on the workqueue task prior to doing the > socket allocation, and having nfs_release_page check for that flag when > deciding whether to do a commit call. Also, set PF_FSTRANS > unconditionally in rpc_async_schedule since that function can also do > allocations sometimes. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfs/file.c | 7 +++++-- > net/sunrpc/sched.c | 2 ++ > net/sunrpc/xprtrdma/transport.c | 3 ++- > net/sunrpc/xprtsock.c | 10 ++++++++++ > 4 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index a6708e6b..9075769 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -459,8 +459,11 @@ static int nfs_release_page(struct page *page, gfp_t gfp) > > dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); > > - /* Only do I/O if gfp is a superset of GFP_KERNEL */ > - if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { > + /* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not > + * doing this memory reclaim for a fs-related allocation. > + */ > + if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL && > + !(current->flags & PF_FSTRANS)) { > int how = FLUSH_SYNC; > > /* Don't let kswapd deadlock waiting for OOM RPC calls */ > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index 994cfea..eda32ae 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -790,7 +790,9 @@ void rpc_execute(struct rpc_task *task) > > static void rpc_async_schedule(struct work_struct *work) > { > + current->flags |= PF_FSTRANS; > __rpc_execute(container_of(work, struct rpc_task, u.tk_work)); > + current->flags &= ~PF_FSTRANS; > } > > /** > diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c > index b446e10..06cdbff 100644 > --- a/net/sunrpc/xprtrdma/transport.c > +++ b/net/sunrpc/xprtrdma/transport.c > @@ -200,6 +200,7 @@ xprt_rdma_connect_worker(struct work_struct *work) > int rc = 0; > > if (!xprt->shutdown) { > + current->flags |= PF_FSTRANS; > xprt_clear_connected(xprt); > > dprintk("RPC: %s: %sconnect\n", __func__, > @@ -212,10 +213,10 @@ xprt_rdma_connect_worker(struct work_struct *work) > > out: > xprt_wake_pending_tasks(xprt, rc); > - > out_clear: > dprintk("RPC: %s: exit\n", __func__); > xprt_clear_connecting(xprt); > + current->flags &= ~PF_FSTRANS; > } > > /* > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 890b03f..b88c6bf 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1895,6 +1895,8 @@ static void xs_local_setup_socket(struct work_struct *work) > if (xprt->shutdown) > goto out; > > + current->flags |= PF_FSTRANS; > + > clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); > status = __sock_create(xprt->xprt_net, AF_LOCAL, > SOCK_STREAM, 0, &sock, 1); > @@ -1928,6 +1930,7 @@ static void xs_local_setup_socket(struct work_struct *work) > out: > xprt_clear_connecting(xprt); > xprt_wake_pending_tasks(xprt, status); > + current->flags &= ~PF_FSTRANS; > } > > static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > @@ -1970,6 +1973,8 @@ static void xs_udp_setup_socket(struct work_struct *work) > if (xprt->shutdown) > goto out; > > + current->flags |= PF_FSTRANS; > + > /* Start by resetting any existing state */ > xs_reset_transport(transport); > sock = xs_create_sock(xprt, transport, > @@ -1988,6 +1993,7 @@ static void xs_udp_setup_socket(struct work_struct *work) > out: > xprt_clear_connecting(xprt); > xprt_wake_pending_tasks(xprt, status); > + current->flags &= ~PF_FSTRANS; > } > > /* > @@ -2113,6 +2119,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > if (xprt->shutdown) > goto out; > > + current->flags |= PF_FSTRANS; > + > if (!sock) { > clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); > sock = xs_create_sock(xprt, transport, > @@ -2162,6 +2170,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) > case -EINPROGRESS: > case -EALREADY: > xprt_clear_connecting(xprt); > + current->flags &= ~PF_FSTRANS; > return; > case -EINVAL: > /* Happens, for instance, if the user specified a link > @@ -2174,6 +2183,7 @@ out_eagain: > out: > xprt_clear_connecting(xprt); > xprt_wake_pending_tasks(xprt, status); > + current->flags &= ~PF_FSTRANS; > } > > /** I've given a backported version of the above patch to a customer and they're testing it now. It may be some time before we have any results however. In the meantime, someone reported this bug against Fedora 16 which is a similar situation: https://bugzilla.redhat.com/show_bug.cgi?id=834808 Here's the stack trace of interest: # cat /proc/pid-of-openvpn/stack [<ffffffffa08484a9>] nfs_wait_bit_killable+0x39/0x90 [nfs] [<ffffffffa085737a>] nfs_commit_inode+0xaa/0x250 [nfs] [<ffffffffa0845f14>] nfs_release_page+0x84/0xa0 [nfs] [<ffffffff811205f2>] try_to_release_page+0x32/0x50 [<ffffffff81133e62>] shrink_page_list+0x792/0x9a0 [<ffffffff81134484>] shrink_inactive_list+0x184/0x4f0 [<ffffffff81134fc8>] shrink_mem_cgroup_zone+0x448/0x5d0 [<ffffffff811351c6>] shrink_zone+0x76/0xa0 [<ffffffff81135564>] do_try_to_free_pages+0x104/0x5c0 [<ffffffff81135cfb>] try_to_free_pages+0xab/0x170 [<ffffffff811299dc>] __alloc_pages_nodemask+0x53c/0x8f0 [<ffffffff81161273>] alloc_pages_current+0xa3/0x110 [<ffffffff8116c2e5>] new_slab+0x245/0x2f0 [<ffffffff815ed28d>] __slab_alloc+0x32b/0x440 [<ffffffff8116f2b7>] __kmalloc_node_track_caller+0x97/0x1f0 [<ffffffff814d9678>] __alloc_skb+0x78/0x230 [<ffffffff814d3fbc>] sock_alloc_send_pskb+0x1ac/0x2f0 [<ffffffff814d4115>] sock_alloc_send_skb+0x15/0x20 [<ffffffff81520861>] __ip_append_data+0x711/0xa10 [<ffffffff81522bd0>] ip_make_skb+0xd0/0x110 [<ffffffff815464ef>] udp_sendmsg+0x2df/0x990 [<ffffffff81550f24>] inet_sendmsg+0x64/0xb0 [<ffffffff814cedc7>] sock_sendmsg+0x117/0x130 [<ffffffff814d22bd>] sys_sendto+0x13d/0x190 [<ffffffff815fc5e9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Here, they're mounting over openvpn and it deadlocks in a similar fashion. The openvpn process is to get memory to send a frame on the "physical" connection, but that won't happen until the commit comes back. Perhaps we should also take something like the (untested) patch below on top of the patch above? Note that kswapd always has PF_MEMALLOC set: -----------------[snip]-------------------- nfs: broaden the cases where we use a non-blocking commit in releasepage Currently, we just do a non-blocking commit when called from kswapd, but that still gives us some cases where we end up blocking after recursing back into the filesystem. Instead, turn that check into a check for PF_MEMALLOC so that we never block when trying to free memory in order to satisfy an allocation. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/file.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 9075769..61d3670 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -466,8 +466,8 @@ static int nfs_release_page(struct page *page, gfp_t gfp) !(current->flags & PF_FSTRANS)) { int how = FLUSH_SYNC; - /* Don't let kswapd deadlock waiting for OOM RPC calls */ - if (current_is_kswapd()) + /* Don't block if we're freeing for a memory allocation */ + if (current->flags & PF_MEMALLOC) how = 0; nfs_commit_inode(mapping->host, how); } -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons 2012-07-02 15:17 ` Jeff Layton @ 2012-07-02 19:40 ` Myklebust, Trond 2012-07-03 11:20 ` Jeff Layton 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-07-02 19:40 UTC (permalink / raw) To: Jeff Layton; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil T24gTW9uLCAyMDEyLTA3LTAyIGF0IDExOjE3IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g bmZzOiBicm9hZGVuIHRoZSBjYXNlcyB3aGVyZSB3ZSB1c2UgYSBub24tYmxvY2tpbmcgY29tbWl0 IGluIHJlbGVhc2VwYWdlDQo+IA0KPiBDdXJyZW50bHksIHdlIGp1c3QgZG8gYSBub24tYmxvY2tp bmcgY29tbWl0IHdoZW4gY2FsbGVkIGZyb20ga3N3YXBkLCBidXQNCj4gdGhhdCBzdGlsbCBnaXZl cyB1cyBzb21lIGNhc2VzIHdoZXJlIHdlIGVuZCB1cCBibG9ja2luZyBhZnRlciByZWN1cnNpbmcN Cj4gYmFjayBpbnRvIHRoZSBmaWxlc3lzdGVtLiBJbnN0ZWFkLCB0dXJuIHRoYXQgY2hlY2sgaW50 byBhIGNoZWNrIGZvcg0KPiBQRl9NRU1BTExPQyBzbyB0aGF0IHdlIG5ldmVyIGJsb2NrIHdoZW4g dHJ5aW5nIHRvIGZyZWUgbWVtb3J5IGluIG9yZGVyIHRvDQo+IHNhdGlzZnkgYW4gYWxsb2NhdGlv bi4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhhdC5jb20+ DQo+IC0tLQ0KPiAgZnMvbmZzL2ZpbGUuYyB8ICAgIDQgKystLQ0KPiAgMSBmaWxlcyBjaGFuZ2Vk LCAyIGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMv bmZzL2ZpbGUuYyBiL2ZzL25mcy9maWxlLmMNCj4gaW5kZXggOTA3NTc2OS4uNjFkMzY3MCAxMDA2 NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmlsZS5jDQo+IEBAIC00 NjYsOCArNDY2LDggQEAgc3RhdGljIGludCBuZnNfcmVsZWFzZV9wYWdlKHN0cnVjdCBwYWdlICpw YWdlLCBnZnBfdCBnZnApDQo+ICAJICAgICEoY3VycmVudC0+ZmxhZ3MgJiBQRl9GU1RSQU5TKSkg ew0KPiAgCQlpbnQgaG93ID0gRkxVU0hfU1lOQzsNCj4gIA0KPiAtCQkvKiBEb24ndCBsZXQga3N3 YXBkIGRlYWRsb2NrIHdhaXRpbmcgZm9yIE9PTSBSUEMgY2FsbHMgKi8NCj4gLQkJaWYgKGN1cnJl bnRfaXNfa3N3YXBkKCkpDQo+ICsJCS8qIERvbid0IGJsb2NrIGlmIHdlJ3JlIGZyZWVpbmcgZm9y IGEgbWVtb3J5IGFsbG9jYXRpb24gKi8NCj4gKwkJaWYgKGN1cnJlbnQtPmZsYWdzICYgUEZfTUVN QUxMT0MpDQo+ICAJCQlob3cgPSAwOw0KPiAgCQluZnNfY29tbWl0X2lub2RlKG1hcHBpbmctPmhv c3QsIGhvdyk7DQo+ICAJfQ0KDQpVbW0uLi4gU28gd2hpY2ggcHJvY2VzcyB3aWxsIGFjdHVhbGx5 IGNhbGwgbmZzX3JlbGVhc2VfcGFnZSgpIHdpdGgNCkdGUF9LRVJORUwsIGJ1dCB3aXRob3V0IHRo ZSBQRl9NRU1BTExPQyBmbGFnIGJlaW5nIHNldD8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxp bnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRh cHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons 2012-07-02 19:40 ` Myklebust, Trond @ 2012-07-03 11:20 ` Jeff Layton 0 siblings, 0 replies; 23+ messages in thread From: Jeff Layton @ 2012-07-03 11:20 UTC (permalink / raw) To: Myklebust, Trond; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil On Mon, 2 Jul 2012 19:40:22 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Mon, 2012-07-02 at 11:17 -0400, Jeff Layton wrote: > > nfs: broaden the cases where we use a non-blocking commit in releasepage > > > > Currently, we just do a non-blocking commit when called from kswapd, but > > that still gives us some cases where we end up blocking after recursing > > back into the filesystem. Instead, turn that check into a check for > > PF_MEMALLOC so that we never block when trying to free memory in order to > > satisfy an allocation. > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > --- > > fs/nfs/file.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > > index 9075769..61d3670 100644 > > --- a/fs/nfs/file.c > > +++ b/fs/nfs/file.c > > @@ -466,8 +466,8 @@ static int nfs_release_page(struct page *page, gfp_t gfp) > > !(current->flags & PF_FSTRANS)) { > > int how = FLUSH_SYNC; > > > > - /* Don't let kswapd deadlock waiting for OOM RPC calls */ > > - if (current_is_kswapd()) > > + /* Don't block if we're freeing for a memory allocation */ > > + if (current->flags & PF_MEMALLOC) > > how = 0; > > nfs_commit_inode(mapping->host, how); > > } > > Umm... So which process will actually call nfs_release_page() with > GFP_KERNEL, but without the PF_MEMALLOC flag being set? > Well...there are a number of callers of try_to_release_page with GFP_KERNEL that are not involved with reclaim: The migrate_page codepaths and the splice code, for instance. Also invalidate_complete_page2, which we call when invalidating an inode and maybe also when truncating? Those are almost certainly less traveled than the reclaim codepaths though. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-07-03 11:21 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-15 12:54 rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete Harshula 2012-06-15 13:21 ` Jeff Layton 2012-06-15 21:25 ` Myklebust, Trond 2012-06-20 14:14 ` Jeff Layton 2012-06-28 14:31 ` Mel Gorman 2012-06-28 14:55 ` Jeff Layton 2012-06-27 15:54 ` Jeff Layton 2012-06-27 18:43 ` Myklebust, Trond 2012-06-27 19:28 ` Jeff Layton 2012-06-27 19:37 ` Myklebust, Trond 2012-06-27 20:18 ` Jeff Layton 2012-06-27 19:56 ` Myklebust, Trond 2012-06-27 20:19 ` Jeff Layton 2012-06-27 20:46 ` Myklebust, Trond 2012-06-28 13:19 ` Jeff Layton 2012-06-28 13:40 ` Myklebust, Trond 2012-06-28 14:47 ` Jeff Layton 2012-06-28 14:51 ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton 2012-06-28 15:02 ` Myklebust, Trond 2012-06-28 15:25 ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton 2012-07-02 15:17 ` Jeff Layton 2012-07-02 19:40 ` Myklebust, Trond 2012-07-03 11:20 ` Jeff Layton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.