All of lore.kernel.org
 help / color / mirror / Atom feed
* rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
@ 2012-06-15 12:54 Harshula
  2012-06-15 13:21 ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Harshula @ 2012-06-15 12:54 UTC (permalink / raw)
  To: linux-nfs; +Cc: jlayton

Hi All,

Can I please get your feedback on the following?

rpciod is blocked:
------------------------------------------------------------
crash> bt  2507 
PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
 #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
 #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
 #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
 #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
 #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
------------------------------------------------------------

nfs_release_page() gives kswapd process an exemption from being blocked.
Should we do the same for rpciod. Otherwise rpciod could end up in a
deadlock where it can not continue without freeing memory that will only
become available when rpciod does its work:
------------------------------------------------------------
479 /*
480  * Attempt to release the private state associated with a page
481  * - Called if either PG_private or PG_fscache is set on the page
482  * - Caller holds page lock
483  * - Return true (may release page) or false (may not)
484  */
485 static int nfs_release_page(struct page *page, gfp_t gfp)
486 {   
487         struct address_space *mapping = page->mapping;
488     
489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
490     
491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
493                 int how = FLUSH_SYNC;
494             
495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
496                 if (current_is_kswapd())
497                         how = 0;
498                 nfs_commit_inode(mapping->host, how);
499         }
500         /* If PagePrivate() is set, then the page is not freeable */
501         if (PagePrivate(page))
502                 return 0;
503         return nfs_fscache_release_page(page, gfp);
504 }
------------------------------------------------------------

Another option is to change the priority of the memory allocation:
net/ipv4/af_inet.c
------------------------------------------------------------
 265 static int inet_create(struct net *net, struct socket *sock, int
protocol,
 266                        int kern)
 267 {
...
 345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
------------------------------------------------------------
Considering this is generic net code, I assume the GFP_KERNEL will not
be replaced with GFP_ATOMIC.

NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
'legacy' workqueue code.

cya,
#


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-15 12:54 rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete Harshula
@ 2012-06-15 13:21 ` Jeff Layton
  2012-06-15 21:25   ` Myklebust, Trond
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-15 13:21 UTC (permalink / raw)
  To: Harshula; +Cc: linux-nfs

On Fri, 15 Jun 2012 22:54:10 +1000
Harshula <harshula@redhat.com> wrote:

> Hi All,
> 
> Can I please get your feedback on the following?
> 
> rpciod is blocked:
> ------------------------------------------------------------
> crash> bt  2507 
> PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
>  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
>  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
>  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
>  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
>  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
>  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
>  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
>  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
>  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
>  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> ------------------------------------------------------------
> 
> nfs_release_page() gives kswapd process an exemption from being blocked.
> Should we do the same for rpciod. Otherwise rpciod could end up in a
> deadlock where it can not continue without freeing memory that will only
> become available when rpciod does its work:
> ------------------------------------------------------------
> 479 /*
> 480  * Attempt to release the private state associated with a page
> 481  * - Called if either PG_private or PG_fscache is set on the page
> 482  * - Caller holds page lock
> 483  * - Return true (may release page) or false (may not)
> 484  */
> 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> 486 {   
> 487         struct address_space *mapping = page->mapping;
> 488     
> 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> 490     
> 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> 493                 int how = FLUSH_SYNC;
> 494             
> 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> 496                 if (current_is_kswapd())
> 497                         how = 0;
> 498                 nfs_commit_inode(mapping->host, how);
> 499         }
> 500         /* If PagePrivate() is set, then the page is not freeable */
> 501         if (PagePrivate(page))
> 502                 return 0;
> 503         return nfs_fscache_release_page(page, gfp);
> 504 }
> ------------------------------------------------------------
> 
> Another option is to change the priority of the memory allocation:
> net/ipv4/af_inet.c
> ------------------------------------------------------------
>  265 static int inet_create(struct net *net, struct socket *sock, int
> protocol,
>  266                        int kern)
>  267 {
> ...
>  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> ------------------------------------------------------------
> Considering this is generic net code, I assume the GFP_KERNEL will not
> be replaced with GFP_ATOMIC.
> 
> NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> 'legacy' workqueue code.
> 
> cya,
> #
> 

I suspect this is also a problem in mainline, but maybe some of the
recent writeback changes prevent it...

I think the right solution here is to make nfs_release_page treat rpciod
similarly to kswapd. Easier said than done though -- you'll need to
come up with a way to determine if you're running in rpciod context...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-15 13:21 ` Jeff Layton
@ 2012-06-15 21:25   ` Myklebust, Trond
  2012-06-20 14:14     ` Jeff Layton
  2012-06-27 15:54     ` Jeff Layton
  0 siblings, 2 replies; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-15 21:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs

T24gRnJpLCAyMDEyLTA2LTE1IGF0IDA5OjIxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gRnJpLCAxNSBKdW4gMjAxMiAyMjo1NDoxMCArMTAwMA0KPiBIYXJzaHVsYSA8aGFyc2h1bGFA
cmVkaGF0LmNvbT4gd3JvdGU6DQo+IA0KPiA+IEhpIEFsbCwNCj4gPiANCj4gPiBDYW4gSSBwbGVh
c2UgZ2V0IHlvdXIgZmVlZGJhY2sgb24gdGhlIGZvbGxvd2luZz8NCj4gPiANCj4gPiBycGNpb2Qg
aXMgYmxvY2tlZDoNCj4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiBjcmFzaD4gYnQgIDI1MDcgDQo+ID4gUElEOiAyNTA3
ICAgVEFTSzogZmZmZjg4MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCIN
Cj4gPiAgIzAgW2ZmZmY4ODEwMzQzYmYyZjBdIHNjaGVkdWxlIGF0IGZmZmZmZmZmODE0ZGFiZDkN
Cj4gPiAgIzEgW2ZmZmY4ODEwMzQzYmYzYjhdIG5mc193YWl0X2JpdF9raWxsYWJsZSBhdCBmZmZm
ZmZmZmEwMzhmYzA0IFtuZnNdDQo+ID4gICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRfb25f
Yml0IGF0IGZmZmZmZmZmODE0ZGJjMmYNCj4gPiAgIzMgW2ZmZmY4ODEwMzQzYmY0MThdIG91dF9v
Zl9saW5lX3dhaXRfb25fYml0IGF0IGZmZmZmZmZmODE0ZGJjZDgNCj4gPiAgIzQgW2ZmZmY4ODEw
MzQzYmY0ODhdIG5mc19jb21taXRfaW5vZGUgYXQgZmZmZmZmZmZhMDM5ZTBjMSBbbmZzXQ0KPiA+
ICAjNSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZmZmZmZmEwMzhi
ZWY2IFtuZnNdDQo+ID4gICM2IFtmZmZmODgxMDM0M2JmNTI4XSB0cnlfdG9fcmVsZWFzZV9wYWdl
IGF0IGZmZmZmZmZmODExMGM2NzANCj4gPiAgIzcgW2ZmZmY4ODEwMzQzYmY1MzhdIHNocmlua19w
YWdlX2xpc3QuY2xvbmUuMCBhdCBmZmZmZmZmZjgxMTI2MjcxDQo+ID4gICM4IFtmZmZmODgxMDM0
M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgxMTI2NjM4DQo+ID4gICM5
IFtmZmZmODgxMDM0M2JmODE4XSBzaHJpbmtfem9uZSBhdCBmZmZmZmZmZjgxMTI3ODhmDQo+ID4g
IzEwIFtmZmZmODgxMDM0M2JmOGM4XSBkb190cnlfdG9fZnJlZV9wYWdlcyBhdCBmZmZmZmZmZjgx
MTI3YjFlDQo+ID4gIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0cnlfdG9fZnJlZV9wYWdlcyBhdCBm
ZmZmZmZmZjgxMTI4MTJmDQo+ID4gIzEyIFtmZmZmODgxMDM0M2JmYTA4XSBfX2FsbG9jX3BhZ2Vz
X25vZGVtYXNrIGF0IGZmZmZmZmZmODExMWZkYWQNCj4gPiAjMTMgW2ZmZmY4ODEwMzQzYmZiMjhd
IGttZW1fZ2V0cGFnZXMgYXQgZmZmZmZmZmY4MTE1OTk0Mg0KPiA+ICMxNCBbZmZmZjg4MTAzNDNi
ZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YTU1YQ0KPiA+ICMxNSBbZmZmZjg4
MTAzNDNiZmJkOF0gX19fX2NhY2hlX2FsbG9jX25vZGUgYXQgZmZmZmZmZmY4MTE1YTJkOQ0KPiA+
ICMxNiBbZmZmZjg4MTAzNDNiZmMzOF0ga21lbV9jYWNoZV9hbGxvYyBhdCBmZmZmZmZmZjgxMTVi
MDliDQo+ID4gIzE3IFtmZmZmODgxMDM0M2JmYzc4XSBza19wcm90X2FsbG9jIGF0IGZmZmZmZmZm
ODE0MTE4MDgNCj4gPiAjMTggW2ZmZmY4ODEwMzQzYmZjYjhdIHNrX2FsbG9jIGF0IGZmZmZmZmZm
ODE0MTE5N2MNCj4gPiAjMTkgW2ZmZmY4ODEwMzQzYmZjZThdIGluZXRfY3JlYXRlIGF0IGZmZmZm
ZmZmODE0ODNiYTYNCj4gPiAjMjAgW2ZmZmY4ODEwMzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQg
ZmZmZmZmZmY4MTQwYjRhNw0KPiA+ICMyMSBbZmZmZjg4MTAzNDNiZmQ5OF0geHNfY3JlYXRlX3Nv
Y2sgYXQgZmZmZmZmZmZhMDFmNjQ5YiBbc3VucnBjXQ0KPiA+ICMyMiBbZmZmZjg4MTAzNDNiZmRk
OF0geHNfdGNwX3NldHVwX3NvY2tldCBhdCBmZmZmZmZmZmEwMWY2OTY1IFtzdW5ycGNdDQo+ID4g
IzIzIFtmZmZmODgxMDM0M2JmZTM4XSB3b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDAN
Cj4gPiAjMjQgW2ZmZmY4ODEwMzQzYmZlZThdIGt0aHJlYWQgYXQgZmZmZmZmZmY4MTA4ZGQ5Ng0K
PiA+ICMyNSBbZmZmZjg4MTAzNDNiZmY0OF0ga2VybmVsX3RocmVhZCBhdCBmZmZmZmZmZjgxMDBj
MWNhDQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tDQo+ID4gDQo+ID4gbmZzX3JlbGVhc2VfcGFnZSgpIGdpdmVzIGtzd2FwZCBw
cm9jZXNzIGFuIGV4ZW1wdGlvbiBmcm9tIGJlaW5nIGJsb2NrZWQuDQo+ID4gU2hvdWxkIHdlIGRv
IHRoZSBzYW1lIGZvciBycGNpb2QuIE90aGVyd2lzZSBycGNpb2QgY291bGQgZW5kIHVwIGluIGEN
Cj4gPiBkZWFkbG9jayB3aGVyZSBpdCBjYW4gbm90IGNvbnRpbnVlIHdpdGhvdXQgZnJlZWluZyBt
ZW1vcnkgdGhhdCB3aWxsIG9ubHkNCj4gPiBiZWNvbWUgYXZhaWxhYmxlIHdoZW4gcnBjaW9kIGRv
ZXMgaXRzIHdvcms6DQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gNDc5IC8qDQo+ID4gNDgwICAqIEF0dGVtcHQgdG8g
cmVsZWFzZSB0aGUgcHJpdmF0ZSBzdGF0ZSBhc3NvY2lhdGVkIHdpdGggYSBwYWdlDQo+ID4gNDgx
ICAqIC0gQ2FsbGVkIGlmIGVpdGhlciBQR19wcml2YXRlIG9yIFBHX2ZzY2FjaGUgaXMgc2V0IG9u
IHRoZSBwYWdlDQo+ID4gNDgyICAqIC0gQ2FsbGVyIGhvbGRzIHBhZ2UgbG9jaw0KPiA+IDQ4MyAg
KiAtIFJldHVybiB0cnVlIChtYXkgcmVsZWFzZSBwYWdlKSBvciBmYWxzZSAobWF5IG5vdCkNCj4g
PiA0ODQgICovDQo+ID4gNDg1IHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShzdHJ1Y3QgcGFn
ZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiA+IDQ4NiB7ICAgDQo+ID4gNDg3ICAgICAgICAgc3RydWN0
IGFkZHJlc3Nfc3BhY2UgKm1hcHBpbmcgPSBwYWdlLT5tYXBwaW5nOw0KPiA+IDQ4OCAgICAgDQo+
ID4gNDg5ICAgICAgICAgZGZwcmludGsoUEFHRUNBQ0hFLCAiTkZTOiByZWxlYXNlX3BhZ2UoJXAp
XG4iLCBwYWdlKTsNCj4gPiA0OTAgICAgIA0KPiA+IDQ5MSAgICAgICAgIC8qIE9ubHkgZG8gSS9P
IGlmIGdmcCBpcyBhIHN1cGVyc2V0IG9mIEdGUF9LRVJORUwgKi8NCj4gPiA0OTIgICAgICAgICBp
ZiAobWFwcGluZyAmJiAoZ2ZwICYgR0ZQX0tFUk5FTCkgPT0gR0ZQX0tFUk5FTCkgew0KPiA+IDQ5
MyAgICAgICAgICAgICAgICAgaW50IGhvdyA9IEZMVVNIX1NZTkM7DQo+ID4gNDk0ICAgICAgICAg
ICAgIA0KPiA+IDQ5NSAgICAgICAgICAgICAgICAgLyogRG9uJ3QgbGV0IGtzd2FwZCBkZWFkbG9j
ayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzICovDQo+ID4gNDk2ICAgICAgICAgICAgICAgICBp
ZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gPiA0OTcgICAgICAgICAgICAgICAgICAgICAgICAg
aG93ID0gMDsNCj4gPiA0OTggICAgICAgICAgICAgICAgIG5mc19jb21taXRfaW5vZGUobWFwcGlu
Zy0+aG9zdCwgaG93KTsNCj4gPiA0OTkgICAgICAgICB9DQo+ID4gNTAwICAgICAgICAgLyogSWYg
UGFnZVByaXZhdGUoKSBpcyBzZXQsIHRoZW4gdGhlIHBhZ2UgaXMgbm90IGZyZWVhYmxlICovDQo+
ID4gNTAxICAgICAgICAgaWYgKFBhZ2VQcml2YXRlKHBhZ2UpKQ0KPiA+IDUwMiAgICAgICAgICAg
ICAgICAgcmV0dXJuIDA7DQo+ID4gNTAzICAgICAgICAgcmV0dXJuIG5mc19mc2NhY2hlX3JlbGVh
c2VfcGFnZShwYWdlLCBnZnApOw0KPiA+IDUwNCB9DQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gDQo+ID4gQW5vdGhl
ciBvcHRpb24gaXMgdG8gY2hhbmdlIHRoZSBwcmlvcml0eSBvZiB0aGUgbWVtb3J5IGFsbG9jYXRp
b246DQo+ID4gbmV0L2lwdjQvYWZfaW5ldC5jDQo+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gIDI2NSBzdGF0aWMgaW50
IGluZXRfY3JlYXRlKHN0cnVjdCBuZXQgKm5ldCwgc3RydWN0IHNvY2tldCAqc29jaywgaW50DQo+
ID4gcHJvdG9jb2wsDQo+ID4gIDI2NiAgICAgICAgICAgICAgICAgICAgICAgIGludCBrZXJuKQ0K
PiA+ICAyNjcgew0KPiA+IC4uLg0KPiA+ICAzNDUgICAgICAgICBzayA9IHNrX2FsbG9jKG5ldCwg
UEZfSU5FVCwgR0ZQX0tFUk5FTCwgYW5zd2VyX3Byb3QpOw0KPiA+IC0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPiA+IENvbnNpZGVy
aW5nIHRoaXMgaXMgZ2VuZXJpYyBuZXQgY29kZSwgSSBhc3N1bWUgdGhlIEdGUF9LRVJORUwgd2ls
bCBub3QNCj4gPiBiZSByZXBsYWNlZCB3aXRoIEdGUF9BVE9NSUMuDQo+ID4gDQo+ID4gTk9URSwg
dGhpcyBpcyBvbiBSSEVMIDYuMSBrZXJuZWwgMi42LjMyLTEzMS42LjEgYW5kIGFwcGFyZW50bHkg
dXNlcw0KPiA+ICdsZWdhY3knIHdvcmtxdWV1ZSBjb2RlLg0KPiA+IA0KPiA+IGN5YSwNCj4gPiAj
DQo+ID4gDQo+IA0KPiBJIHN1c3BlY3QgdGhpcyBpcyBhbHNvIGEgcHJvYmxlbSBpbiBtYWlubGlu
ZSwgYnV0IG1heWJlIHNvbWUgb2YgdGhlDQo+IHJlY2VudCB3cml0ZWJhY2sgY2hhbmdlcyBwcmV2
ZW50IGl0Li4uDQo+IA0KPiBJIHRoaW5rIHRoZSByaWdodCBzb2x1dGlvbiBoZXJlIGlzIHRvIG1h
a2UgbmZzX3JlbGVhc2VfcGFnZSB0cmVhdCBycGNpb2QNCj4gc2ltaWxhcmx5IHRvIGtzd2FwZC4g
RWFzaWVyIHNhaWQgdGhhbiBkb25lIHRob3VnaCAtLSB5b3UnbGwgbmVlZCB0bw0KPiBjb21lIHVw
IHdpdGggYSB3YXkgdG8gZGV0ZXJtaW5lIGlmIHlvdSdyZSBydW5uaW5nIGluIHJwY2lvZCBjb250
ZXh0Li4uDQoNCk5vLiBUaGUgX3JpZ2h0XyBzb2x1dGlvbiBpcyB0byBlbnN1cmUgdGhhdCBycGNp
b2QgZG9lc24ndCBkbyBhbGxvY2F0aW9ucw0KdGhhdCByZXN1bHQgaW4gYSBwYWdlIHJlY2xhaW0u
Li4gdHJ5X3RvX3JlbGVhc2VfcGFnZSgpIGlzIGp1c3QgdGhlIHRpcA0Kb2YgdGhlIGljZWJlcmcg
b2YgY3JhenkgZGVhZGxvY2tzIHRoYXQgdGhpcyBzb2NrZXQgYWxsb2NhdGlvbiBjYW4gZ2V0IHVz
DQppbnRvLg0KDQpVbmZvcnR1bmF0ZWx5LCBzZWxpbnV4ICYgY28uIHByZXZlbnQgdXMgZnJvbSBh
bGxvY2F0aW5nIHRoZSBzb2NrZXRzIGluDQp1c2VyIGNvbnRleHRzLCBhbmQgYW55d2F5LCBoYXZp
bmcgdG8gd2FpdCBmb3IgYW5vdGhlciB0aHJlYWQgdG8gZG8gdGhlDQpzYW1lIGFsbG9jYXRpb24g
aXNuJ3QgZG9pbmcgdG8gaGVscCBwcmV2ZW50IHRoZSBkZWFkbG9jay4uLg0KDQpJIGtub3cgdGhh
dCBNZWwgR29ybWFuJ3MgTkZTIHN3YXAgcGF0Y2hlcyBoYWQgc29tZSBwcm90ZWN0aW9ucyBhZ2Fp
bnN0DQp0aGlzIHNvcnQgb2YgZGVhZGxvY2suIFBlcmhhcHMgd2UgY2FuIGxvb2sgYXQgaG93IGhl
IHdhcyBkb2luZyB0aGlzPw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVu
dCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5u
ZXRhcHAuY29tDQoNCg==

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-15 21:25   ` Myklebust, Trond
@ 2012-06-20 14:14     ` Jeff Layton
  2012-06-28 14:31       ` Mel Gorman
  2012-06-27 15:54     ` Jeff Layton
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-20 14:14 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman

On Fri, 15 Jun 2012 21:25:10 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote:
> > On Fri, 15 Jun 2012 22:54:10 +1000
> > Harshula <harshula@redhat.com> wrote:
> > 
> > > Hi All,
> > > 
> > > Can I please get your feedback on the following?
> > > 
> > > rpciod is blocked:
> > > ------------------------------------------------------------
> > > crash> bt  2507 
> > > PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
> > >  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
> > >  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
> > >  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
> > >  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
> > >  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
> > >  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
> > >  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
> > >  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
> > >  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
> > >  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> > > ------------------------------------------------------------
> > > 
> > > nfs_release_page() gives kswapd process an exemption from being blocked.
> > > Should we do the same for rpciod. Otherwise rpciod could end up in a
> > > deadlock where it can not continue without freeing memory that will only
> > > become available when rpciod does its work:
> > > ------------------------------------------------------------
> > > 479 /*
> > > 480  * Attempt to release the private state associated with a page
> > > 481  * - Called if either PG_private or PG_fscache is set on the page
> > > 482  * - Caller holds page lock
> > > 483  * - Return true (may release page) or false (may not)
> > > 484  */
> > > 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> > > 486 {   
> > > 487         struct address_space *mapping = page->mapping;
> > > 488     
> > > 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> > > 490     
> > > 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > > 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > > 493                 int how = FLUSH_SYNC;
> > > 494             
> > > 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > > 496                 if (current_is_kswapd())
> > > 497                         how = 0;
> > > 498                 nfs_commit_inode(mapping->host, how);
> > > 499         }
> > > 500         /* If PagePrivate() is set, then the page is not freeable */
> > > 501         if (PagePrivate(page))
> > > 502                 return 0;
> > > 503         return nfs_fscache_release_page(page, gfp);
> > > 504 }
> > > ------------------------------------------------------------
> > > 
> > > Another option is to change the priority of the memory allocation:
> > > net/ipv4/af_inet.c
> > > ------------------------------------------------------------
> > >  265 static int inet_create(struct net *net, struct socket *sock, int
> > > protocol,
> > >  266                        int kern)
> > >  267 {
> > > ...
> > >  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> > > ------------------------------------------------------------
> > > Considering this is generic net code, I assume the GFP_KERNEL will not
> > > be replaced with GFP_ATOMIC.
> > > 
> > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> > > 'legacy' workqueue code.
> > > 
> > > cya,
> > > #
> > > 
> > 
> > I suspect this is also a problem in mainline, but maybe some of the
> > recent writeback changes prevent it...
> > 
> > I think the right solution here is to make nfs_release_page treat rpciod
> > similarly to kswapd. Easier said than done though -- you'll need to
> > come up with a way to determine if you're running in rpciod context...
> 
> No. The _right_ solution is to ensure that rpciod doesn't do allocations
> that result in a page reclaim... try_to_release_page() is just the tip
> of the iceberg of crazy deadlocks that this socket allocation can get us
> into.
> 
> Unfortunately, selinux & co. prevent us from allocating the sockets in
> user contexts, and anyway, having to wait for another thread to do the
> same allocation isn't doing to help prevent the deadlock...
> 

My thinking was that we could return without releasing the page, and
eventually kmalloc would get to one that was able to be reclaimed via
other means. That's probably too optimistic though...

> I know that Mel Gorman's NFS swap patches had some protections against
> this sort of deadlock. Perhaps we can look at how he was doing this?
> 

(cc'ing Mel in case he has thoughts)

I looked over Mel's patches and they mainly seem to affect how memory
for the socket is handled after it has been created. In this case, we're
hanging on the allocation of a new socket altogether.

It does do this before calling xs_create_sock however:

+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+

Perhaps we should just set PF_MEMALLOC unconditionally there? We might
still need Mel's other patches that add SOCK_MEMALLOC to ensure that it
can actually be connected, but that's a bit of a different problem.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-15 21:25   ` Myklebust, Trond
  2012-06-20 14:14     ` Jeff Layton
@ 2012-06-27 15:54     ` Jeff Layton
  2012-06-27 18:43       ` Myklebust, Trond
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-27 15:54 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs

On Fri, 15 Jun 2012 21:25:10 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote:
> > On Fri, 15 Jun 2012 22:54:10 +1000
> > Harshula <harshula@redhat.com> wrote:
> > 
> > > Hi All,
> > > 
> > > Can I please get your feedback on the following?
> > > 
> > > rpciod is blocked:
> > > ------------------------------------------------------------
> > > crash> bt  2507 
> > > PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
> > >  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
> > >  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
> > >  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
> > >  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
> > >  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
> > >  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
> > >  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
> > >  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
> > >  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
> > >  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> > > ------------------------------------------------------------
> > > 
> > > nfs_release_page() gives kswapd process an exemption from being blocked.
> > > Should we do the same for rpciod. Otherwise rpciod could end up in a
> > > deadlock where it can not continue without freeing memory that will only
> > > become available when rpciod does its work:
> > > ------------------------------------------------------------
> > > 479 /*
> > > 480  * Attempt to release the private state associated with a page
> > > 481  * - Called if either PG_private or PG_fscache is set on the page
> > > 482  * - Caller holds page lock
> > > 483  * - Return true (may release page) or false (may not)
> > > 484  */
> > > 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> > > 486 {   
> > > 487         struct address_space *mapping = page->mapping;
> > > 488     
> > > 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> > > 490     
> > > 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > > 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > > 493                 int how = FLUSH_SYNC;
> > > 494             
> > > 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > > 496                 if (current_is_kswapd())
> > > 497                         how = 0;
> > > 498                 nfs_commit_inode(mapping->host, how);
> > > 499         }
> > > 500         /* If PagePrivate() is set, then the page is not freeable */
> > > 501         if (PagePrivate(page))
> > > 502                 return 0;
> > > 503         return nfs_fscache_release_page(page, gfp);
> > > 504 }
> > > ------------------------------------------------------------
> > > 
> > > Another option is to change the priority of the memory allocation:
> > > net/ipv4/af_inet.c
> > > ------------------------------------------------------------
> > >  265 static int inet_create(struct net *net, struct socket *sock, int
> > > protocol,
> > >  266                        int kern)
> > >  267 {
> > > ...
> > >  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> > > ------------------------------------------------------------
> > > Considering this is generic net code, I assume the GFP_KERNEL will not
> > > be replaced with GFP_ATOMIC.
> > > 
> > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> > > 'legacy' workqueue code.
> > > 
> > > cya,
> > > #
> > > 
> > 
> > I suspect this is also a problem in mainline, but maybe some of the
> > recent writeback changes prevent it...
> > 
> > I think the right solution here is to make nfs_release_page treat rpciod
> > similarly to kswapd. Easier said than done though -- you'll need to
> > come up with a way to determine if you're running in rpciod context...
> 
> No. The _right_ solution is to ensure that rpciod doesn't do allocations
> that result in a page reclaim... try_to_release_page() is just the tip
> of the iceberg of crazy deadlocks that this socket allocation can get us
> into.
> 
> Unfortunately, selinux & co. prevent us from allocating the sockets in
> user contexts, and anyway, having to wait for another thread to do the
> same allocation isn't doing to help prevent the deadlock...
> 
> I know that Mel Gorman's NFS swap patches had some protections against
> this sort of deadlock. Perhaps we can look at how he was doing this?
> 

Actually...now that I've looked at this, I think the right solution
here is to stop calling sock_release() on these sockets until we're in
xs_destroy. IOW, don't free the socket altogether unless we're truly
tearing down the connection for good. That means though that we need to
convert the places that currently call rpc_xprt_ops->close to call
something like a (new) ops->shutdown routine instead, with the
assumption of course that we can somehow reset and reuse the socket
afterward.

The question is how best to emulate the effect of ->close without
actually freeing the socket. Is it sufficient to call something like
kernel_sock_shutdown() on it, and twiddle all of the bits to ensure
that the connect_worker will get called afterward?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 15:54     ` Jeff Layton
@ 2012-06-27 18:43       ` Myklebust, Trond
  2012-06-27 19:28         ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-27 18:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs

T24gV2VkLCAyMDEyLTA2LTI3IGF0IDExOjU0IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gRnJpLCAxNSBKdW4gMjAxMiAyMToyNToxMCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gRnJpLCAyMDEy
LTA2LTE1IGF0IDA5OjIxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIEZyaSwg
MTUgSnVuIDIwMTIgMjI6NTQ6MTAgKzEwMDANCj4gPiA+IEhhcnNodWxhIDxoYXJzaHVsYUByZWRo
YXQuY29tPiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gPiBIaSBBbGwsDQo+ID4gPiA+IA0KPiA+ID4g
PiBDYW4gSSBwbGVhc2UgZ2V0IHlvdXIgZmVlZGJhY2sgb24gdGhlIGZvbGxvd2luZz8NCj4gPiA+
ID4gDQo+ID4gPiA+IHJwY2lvZCBpcyBibG9ja2VkOg0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gY3Jh
c2g+IGJ0ICAyNTA3IA0KPiA+ID4gPiBQSUQ6IDI1MDcgICBUQVNLOiBmZmZmODgxMDM2OTFhYjQw
ICBDUFU6IDE0ICBDT01NQU5EOiAicnBjaW9kLzE0Ig0KPiA+ID4gPiAgIzAgW2ZmZmY4ODEwMzQz
YmYyZjBdIHNjaGVkdWxlIGF0IGZmZmZmZmZmODE0ZGFiZDkNCj4gPiA+ID4gICMxIFtmZmZmODgx
MDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZmZmZmZmZhMDM4ZmMwNCBbbmZz
XQ0KPiA+ID4gPiAgIzIgW2ZmZmY4ODEwMzQzYmYzYzhdIF9fd2FpdF9vbl9iaXQgYXQgZmZmZmZm
ZmY4MTRkYmMyZg0KPiA+ID4gPiAgIzMgW2ZmZmY4ODEwMzQzYmY0MThdIG91dF9vZl9saW5lX3dh
aXRfb25fYml0IGF0IGZmZmZmZmZmODE0ZGJjZDgNCj4gPiA+ID4gICM0IFtmZmZmODgxMDM0M2Jm
NDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25mc10NCj4gPiA+ID4g
ICM1IFtmZmZmODgxMDM0M2JmNGY4XSBuZnNfcmVsZWFzZV9wYWdlIGF0IGZmZmZmZmZmYTAzOGJl
ZjYgW25mc10NCj4gPiA+ID4gICM2IFtmZmZmODgxMDM0M2JmNTI4XSB0cnlfdG9fcmVsZWFzZV9w
YWdlIGF0IGZmZmZmZmZmODExMGM2NzANCj4gPiA+ID4gICM3IFtmZmZmODgxMDM0M2JmNTM4XSBz
aHJpbmtfcGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiA+ID4gPiAgIzgg
W2ZmZmY4ODEwMzQzYmY2NjhdIHNocmlua19pbmFjdGl2ZV9saXN0IGF0IGZmZmZmZmZmODExMjY2
MzgNCj4gPiA+ID4gICM5IFtmZmZmODgxMDM0M2JmODE4XSBzaHJpbmtfem9uZSBhdCBmZmZmZmZm
ZjgxMTI3ODhmDQo+ID4gPiA+ICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVf
cGFnZXMgYXQgZmZmZmZmZmY4MTEyN2IxZQ0KPiA+ID4gPiAjMTEgW2ZmZmY4ODEwMzQzYmY5NThd
IHRyeV90b19mcmVlX3BhZ2VzIGF0IGZmZmZmZmZmODExMjgxMmYNCj4gPiA+ID4gIzEyIFtmZmZm
ODgxMDM0M2JmYTA4XSBfX2FsbG9jX3BhZ2VzX25vZGVtYXNrIGF0IGZmZmZmZmZmODExMWZkYWQN
Cj4gPiA+ID4gIzEzIFtmZmZmODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZm
ODExNTk5NDINCj4gPiA+ID4gIzE0IFtmZmZmODgxMDM0M2JmYjU4XSBmYWxsYmFja19hbGxvYyBh
dCBmZmZmZmZmZjgxMTVhNTVhDQo+ID4gPiA+ICMxNSBbZmZmZjg4MTAzNDNiZmJkOF0gX19fX2Nh
Y2hlX2FsbG9jX25vZGUgYXQgZmZmZmZmZmY4MTE1YTJkOQ0KPiA+ID4gPiAjMTYgW2ZmZmY4ODEw
MzQzYmZjMzhdIGttZW1fY2FjaGVfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiA+ID4gPiAj
MTcgW2ZmZmY4ODEwMzQzYmZjNzhdIHNrX3Byb3RfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTgwOA0K
PiA+ID4gPiAjMTggW2ZmZmY4ODEwMzQzYmZjYjhdIHNrX2FsbG9jIGF0IGZmZmZmZmZmODE0MTE5
N2MNCj4gPiA+ID4gIzE5IFtmZmZmODgxMDM0M2JmY2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZm
ZjgxNDgzYmE2DQo+ID4gPiA+ICMyMCBbZmZmZjg4MTAzNDNiZmQzOF0gX19zb2NrX2NyZWF0ZSBh
dCBmZmZmZmZmZjgxNDBiNGE3DQo+ID4gPiA+ICMyMSBbZmZmZjg4MTAzNDNiZmQ5OF0geHNfY3Jl
YXRlX3NvY2sgYXQgZmZmZmZmZmZhMDFmNjQ5YiBbc3VucnBjXQ0KPiA+ID4gPiAjMjIgW2ZmZmY4
ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1cF9zb2NrZXQgYXQgZmZmZmZmZmZhMDFmNjk2NSBbc3Vu
cnBjXQ0KPiA+ID4gPiAjMjMgW2ZmZmY4ODEwMzQzYmZlMzhdIHdvcmtlcl90aHJlYWQgYXQgZmZm
ZmZmZmY4MTA4ODdkMA0KPiA+ID4gPiAjMjQgW2ZmZmY4ODEwMzQzYmZlZThdIGt0aHJlYWQgYXQg
ZmZmZmZmZmY4MTA4ZGQ5Ng0KPiA+ID4gPiAjMjUgW2ZmZmY4ODEwMzQzYmZmNDhdIGtlcm5lbF90
aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gDQo+ID4gPiA+
IG5mc19yZWxlYXNlX3BhZ2UoKSBnaXZlcyBrc3dhcGQgcHJvY2VzcyBhbiBleGVtcHRpb24gZnJv
bSBiZWluZyBibG9ja2VkLg0KPiA+ID4gPiBTaG91bGQgd2UgZG8gdGhlIHNhbWUgZm9yIHJwY2lv
ZC4gT3RoZXJ3aXNlIHJwY2lvZCBjb3VsZCBlbmQgdXAgaW4gYQ0KPiA+ID4gPiBkZWFkbG9jayB3
aGVyZSBpdCBjYW4gbm90IGNvbnRpbnVlIHdpdGhvdXQgZnJlZWluZyBtZW1vcnkgdGhhdCB3aWxs
IG9ubHkNCj4gPiA+ID4gYmVjb21lIGF2YWlsYWJsZSB3aGVuIHJwY2lvZCBkb2VzIGl0cyB3b3Jr
Og0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gNDc5IC8qDQo+ID4gPiA+IDQ4MCAgKiBBdHRlbXB0IHRv
IHJlbGVhc2UgdGhlIHByaXZhdGUgc3RhdGUgYXNzb2NpYXRlZCB3aXRoIGEgcGFnZQ0KPiA+ID4g
PiA0ODEgICogLSBDYWxsZWQgaWYgZWl0aGVyIFBHX3ByaXZhdGUgb3IgUEdfZnNjYWNoZSBpcyBz
ZXQgb24gdGhlIHBhZ2UNCj4gPiA+ID4gNDgyICAqIC0gQ2FsbGVyIGhvbGRzIHBhZ2UgbG9jaw0K
PiA+ID4gPiA0ODMgICogLSBSZXR1cm4gdHJ1ZSAobWF5IHJlbGVhc2UgcGFnZSkgb3IgZmFsc2Ug
KG1heSBub3QpDQo+ID4gPiA+IDQ4NCAgKi8NCj4gPiA+ID4gNDg1IHN0YXRpYyBpbnQgbmZzX3Jl
bGVhc2VfcGFnZShzdHJ1Y3QgcGFnZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiA+ID4gPiA0ODYgeyAg
IA0KPiA+ID4gPiA0ODcgICAgICAgICBzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGluZyA9IHBh
Z2UtPm1hcHBpbmc7DQo+ID4gPiA+IDQ4OCAgICAgDQo+ID4gPiA+IDQ4OSAgICAgICAgIGRmcHJp
bnRrKFBBR0VDQUNIRSwgIk5GUzogcmVsZWFzZV9wYWdlKCVwKVxuIiwgcGFnZSk7DQo+ID4gPiA+
IDQ5MCAgICAgDQo+ID4gPiA+IDQ5MSAgICAgICAgIC8qIE9ubHkgZG8gSS9PIGlmIGdmcCBpcyBh
IHN1cGVyc2V0IG9mIEdGUF9LRVJORUwgKi8NCj4gPiA+ID4gNDkyICAgICAgICAgaWYgKG1hcHBp
bmcgJiYgKGdmcCAmIEdGUF9LRVJORUwpID09IEdGUF9LRVJORUwpIHsNCj4gPiA+ID4gNDkzICAg
ICAgICAgICAgICAgICBpbnQgaG93ID0gRkxVU0hfU1lOQzsNCj4gPiA+ID4gNDk0ICAgICAgICAg
ICAgIA0KPiA+ID4gPiA0OTUgICAgICAgICAgICAgICAgIC8qIERvbid0IGxldCBrc3dhcGQgZGVh
ZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQQyBjYWxscyAqLw0KPiA+ID4gPiA0OTYgICAgICAgICAg
ICAgICAgIGlmIChjdXJyZW50X2lzX2tzd2FwZCgpKQ0KPiA+ID4gPiA0OTcgICAgICAgICAgICAg
ICAgICAgICAgICAgaG93ID0gMDsNCj4gPiA+ID4gNDk4ICAgICAgICAgICAgICAgICBuZnNfY29t
bWl0X2lub2RlKG1hcHBpbmctPmhvc3QsIGhvdyk7DQo+ID4gPiA+IDQ5OSAgICAgICAgIH0NCj4g
PiA+ID4gNTAwICAgICAgICAgLyogSWYgUGFnZVByaXZhdGUoKSBpcyBzZXQsIHRoZW4gdGhlIHBh
Z2UgaXMgbm90IGZyZWVhYmxlICovDQo+ID4gPiA+IDUwMSAgICAgICAgIGlmIChQYWdlUHJpdmF0
ZShwYWdlKSkNCj4gPiA+ID4gNTAyICAgICAgICAgICAgICAgICByZXR1cm4gMDsNCj4gPiA+ID4g
NTAzICAgICAgICAgcmV0dXJuIG5mc19mc2NhY2hlX3JlbGVhc2VfcGFnZShwYWdlLCBnZnApOw0K
PiA+ID4gPiA1MDQgfQ0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gDQo+ID4gPiA+IEFub3RoZXIgb3B0
aW9uIGlzIHRvIGNoYW5nZSB0aGUgcHJpb3JpdHkgb2YgdGhlIG1lbW9yeSBhbGxvY2F0aW9uOg0K
PiA+ID4gPiBuZXQvaXB2NC9hZl9pbmV0LmMNCj4gPiA+ID4gLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gPiA+ICAyNjUgc3Rh
dGljIGludCBpbmV0X2NyZWF0ZShzdHJ1Y3QgbmV0ICpuZXQsIHN0cnVjdCBzb2NrZXQgKnNvY2ss
IGludA0KPiA+ID4gPiBwcm90b2NvbCwNCj4gPiA+ID4gIDI2NiAgICAgICAgICAgICAgICAgICAg
ICAgIGludCBrZXJuKQ0KPiA+ID4gPiAgMjY3IHsNCj4gPiA+ID4gLi4uDQo+ID4gPiA+ICAzNDUg
ICAgICAgICBzayA9IHNrX2FsbG9jKG5ldCwgUEZfSU5FVCwgR0ZQX0tFUk5FTCwgYW5zd2VyX3By
b3QpOw0KPiA+ID4gPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0NCj4gPiA+ID4gQ29uc2lkZXJpbmcgdGhpcyBpcyBnZW5lcmljIG5l
dCBjb2RlLCBJIGFzc3VtZSB0aGUgR0ZQX0tFUk5FTCB3aWxsIG5vdA0KPiA+ID4gPiBiZSByZXBs
YWNlZCB3aXRoIEdGUF9BVE9NSUMuDQo+ID4gPiA+IA0KPiA+ID4gPiBOT1RFLCB0aGlzIGlzIG9u
IFJIRUwgNi4xIGtlcm5lbCAyLjYuMzItMTMxLjYuMSBhbmQgYXBwYXJlbnRseSB1c2VzDQo+ID4g
PiA+ICdsZWdhY3knIHdvcmtxdWV1ZSBjb2RlLg0KPiA+ID4gPiANCj4gPiA+ID4gY3lhLA0KPiA+
ID4gPiAjDQo+ID4gPiA+IA0KPiA+ID4gDQo+ID4gPiBJIHN1c3BlY3QgdGhpcyBpcyBhbHNvIGEg
cHJvYmxlbSBpbiBtYWlubGluZSwgYnV0IG1heWJlIHNvbWUgb2YgdGhlDQo+ID4gPiByZWNlbnQg
d3JpdGViYWNrIGNoYW5nZXMgcHJldmVudCBpdC4uLg0KPiA+ID4gDQo+ID4gPiBJIHRoaW5rIHRo
ZSByaWdodCBzb2x1dGlvbiBoZXJlIGlzIHRvIG1ha2UgbmZzX3JlbGVhc2VfcGFnZSB0cmVhdCBy
cGNpb2QNCj4gPiA+IHNpbWlsYXJseSB0byBrc3dhcGQuIEVhc2llciBzYWlkIHRoYW4gZG9uZSB0
aG91Z2ggLS0geW91J2xsIG5lZWQgdG8NCj4gPiA+IGNvbWUgdXAgd2l0aCBhIHdheSB0byBkZXRl
cm1pbmUgaWYgeW91J3JlIHJ1bm5pbmcgaW4gcnBjaW9kIGNvbnRleHQuLi4NCj4gPiANCj4gPiBO
by4gVGhlIF9yaWdodF8gc29sdXRpb24gaXMgdG8gZW5zdXJlIHRoYXQgcnBjaW9kIGRvZXNuJ3Qg
ZG8gYWxsb2NhdGlvbnMNCj4gPiB0aGF0IHJlc3VsdCBpbiBhIHBhZ2UgcmVjbGFpbS4uLiB0cnlf
dG9fcmVsZWFzZV9wYWdlKCkgaXMganVzdCB0aGUgdGlwDQo+ID4gb2YgdGhlIGljZWJlcmcgb2Yg
Y3JhenkgZGVhZGxvY2tzIHRoYXQgdGhpcyBzb2NrZXQgYWxsb2NhdGlvbiBjYW4gZ2V0IHVzDQo+
ID4gaW50by4NCj4gPiANCj4gPiBVbmZvcnR1bmF0ZWx5LCBzZWxpbnV4ICYgY28uIHByZXZlbnQg
dXMgZnJvbSBhbGxvY2F0aW5nIHRoZSBzb2NrZXRzIGluDQo+ID4gdXNlciBjb250ZXh0cywgYW5k
IGFueXdheSwgaGF2aW5nIHRvIHdhaXQgZm9yIGFub3RoZXIgdGhyZWFkIHRvIGRvIHRoZQ0KPiA+
IHNhbWUgYWxsb2NhdGlvbiBpc24ndCBkb2luZyB0byBoZWxwIHByZXZlbnQgdGhlIGRlYWRsb2Nr
Li4uDQo+ID4gDQo+ID4gSSBrbm93IHRoYXQgTWVsIEdvcm1hbidzIE5GUyBzd2FwIHBhdGNoZXMg
aGFkIHNvbWUgcHJvdGVjdGlvbnMgYWdhaW5zdA0KPiA+IHRoaXMgc29ydCBvZiBkZWFkbG9jay4g
UGVyaGFwcyB3ZSBjYW4gbG9vayBhdCBob3cgaGUgd2FzIGRvaW5nIHRoaXM/DQo+ID4gDQo+IA0K
PiBBY3R1YWxseS4uLm5vdyB0aGF0IEkndmUgbG9va2VkIGF0IHRoaXMsIEkgdGhpbmsgdGhlIHJp
Z2h0IHNvbHV0aW9uDQo+IGhlcmUgaXMgdG8gc3RvcCBjYWxsaW5nIHNvY2tfcmVsZWFzZSgpIG9u
IHRoZXNlIHNvY2tldHMgdW50aWwgd2UncmUgaW4NCj4geHNfZGVzdHJveS4gSU9XLCBkb24ndCBm
cmVlIHRoZSBzb2NrZXQgYWx0b2dldGhlciB1bmxlc3Mgd2UncmUgdHJ1bHkNCj4gdGVhcmluZyBk
b3duIHRoZSBjb25uZWN0aW9uIGZvciBnb29kLiBUaGF0IG1lYW5zIHRob3VnaCB0aGF0IHdlIG5l
ZWQgdG8NCj4gY29udmVydCB0aGUgcGxhY2VzIHRoYXQgY3VycmVudGx5IGNhbGwgcnBjX3hwcnRf
b3BzLT5jbG9zZSB0byBjYWxsDQo+IHNvbWV0aGluZyBsaWtlIGEgKG5ldykgb3BzLT5zaHV0ZG93
biByb3V0aW5lIGluc3RlYWQsIHdpdGggdGhlDQo+IGFzc3VtcHRpb24gb2YgY291cnNlIHRoYXQg
d2UgY2FuIHNvbWVob3cgcmVzZXQgYW5kIHJldXNlIHRoZSBzb2NrZXQNCj4gYWZ0ZXJ3YXJkLg0K
PiANCj4gVGhlIHF1ZXN0aW9uIGlzIGhvdyBiZXN0IHRvIGVtdWxhdGUgdGhlIGVmZmVjdCBvZiAt
PmNsb3NlIHdpdGhvdXQNCj4gYWN0dWFsbHkgZnJlZWluZyB0aGUgc29ja2V0LiBJcyBpdCBzdWZm
aWNpZW50IHRvIGNhbGwgc29tZXRoaW5nIGxpa2UNCj4ga2VybmVsX3NvY2tfc2h1dGRvd24oKSBv
biBpdCwgYW5kIHR3aWRkbGUgYWxsIG9mIHRoZSBiaXRzIHRvIGVuc3VyZQ0KPiB0aGF0IHRoZSBj
b25uZWN0X3dvcmtlciB3aWxsIGdldCBjYWxsZWQgYWZ0ZXJ3YXJkPw0KDQpUaGUgcmVhc29uIHdo
eSB3ZSBjbG9zZSB0aGVzZSBzb2NrZXRzIGlzIHRoYXQgaWYgdGhlIGF0dGVtcHQgYXQgYWJvcnRp
bmcNCnRoZSBjb25uZWN0aW9uIGZhaWxzLCB0aGVuIHRoZXkgdHlwaWNhbGx5IGVuZCB1cCBpbiBh
IFRJTUVfV0FJVCBzdGF0ZS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGll
bnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cu
bmV0YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 18:43       ` Myklebust, Trond
@ 2012-06-27 19:28         ` Jeff Layton
  2012-06-27 19:37           ` Myklebust, Trond
  2012-06-27 19:56           ` Myklebust, Trond
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff Layton @ 2012-06-27 19:28 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs

On Wed, 27 Jun 2012 18:43:56 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2012-06-27 at 11:54 -0400, Jeff Layton wrote:
> > On Fri, 15 Jun 2012 21:25:10 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote:
> > > > On Fri, 15 Jun 2012 22:54:10 +1000
> > > > Harshula <harshula@redhat.com> wrote:
> > > > 
> > > > > Hi All,
> > > > > 
> > > > > Can I please get your feedback on the following?
> > > > > 
> > > > > rpciod is blocked:
> > > > > ------------------------------------------------------------
> > > > > crash> bt  2507 
> > > > > PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
> > > > >  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
> > > > >  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
> > > > >  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
> > > > >  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
> > > > >  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
> > > > >  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
> > > > >  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
> > > > >  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
> > > > >  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
> > > > >  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> > > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> > > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> > > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> > > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> > > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> > > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> > > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> > > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> > > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> > > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> > > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> > > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> > > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> > > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> > > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> > > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> > > > > ------------------------------------------------------------
> > > > > 
> > > > > nfs_release_page() gives kswapd process an exemption from being blocked.
> > > > > Should we do the same for rpciod. Otherwise rpciod could end up in a
> > > > > deadlock where it can not continue without freeing memory that will only
> > > > > become available when rpciod does its work:
> > > > > ------------------------------------------------------------
> > > > > 479 /*
> > > > > 480  * Attempt to release the private state associated with a page
> > > > > 481  * - Called if either PG_private or PG_fscache is set on the page
> > > > > 482  * - Caller holds page lock
> > > > > 483  * - Return true (may release page) or false (may not)
> > > > > 484  */
> > > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> > > > > 486 {   
> > > > > 487         struct address_space *mapping = page->mapping;
> > > > > 488     
> > > > > 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> > > > > 490     
> > > > > 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > > > > 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > > > > 493                 int how = FLUSH_SYNC;
> > > > > 494             
> > > > > 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > > > > 496                 if (current_is_kswapd())
> > > > > 497                         how = 0;
> > > > > 498                 nfs_commit_inode(mapping->host, how);
> > > > > 499         }
> > > > > 500         /* If PagePrivate() is set, then the page is not freeable */
> > > > > 501         if (PagePrivate(page))
> > > > > 502                 return 0;
> > > > > 503         return nfs_fscache_release_page(page, gfp);
> > > > > 504 }
> > > > > ------------------------------------------------------------
> > > > > 
> > > > > Another option is to change the priority of the memory allocation:
> > > > > net/ipv4/af_inet.c
> > > > > ------------------------------------------------------------
> > > > >  265 static int inet_create(struct net *net, struct socket *sock, int
> > > > > protocol,
> > > > >  266                        int kern)
> > > > >  267 {
> > > > > ...
> > > > >  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> > > > > ------------------------------------------------------------
> > > > > Considering this is generic net code, I assume the GFP_KERNEL will not
> > > > > be replaced with GFP_ATOMIC.
> > > > > 
> > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> > > > > 'legacy' workqueue code.
> > > > > 
> > > > > cya,
> > > > > #
> > > > > 
> > > > 
> > > > I suspect this is also a problem in mainline, but maybe some of the
> > > > recent writeback changes prevent it...
> > > > 
> > > > I think the right solution here is to make nfs_release_page treat rpciod
> > > > similarly to kswapd. Easier said than done though -- you'll need to
> > > > come up with a way to determine if you're running in rpciod context...
> > > 
> > > No. The _right_ solution is to ensure that rpciod doesn't do allocations
> > > that result in a page reclaim... try_to_release_page() is just the tip
> > > of the iceberg of crazy deadlocks that this socket allocation can get us
> > > into.
> > > 
> > > Unfortunately, selinux & co. prevent us from allocating the sockets in
> > > user contexts, and anyway, having to wait for another thread to do the
> > > same allocation isn't doing to help prevent the deadlock...
> > > 
> > > I know that Mel Gorman's NFS swap patches had some protections against
> > > this sort of deadlock. Perhaps we can look at how he was doing this?
> > > 
> > 
> > Actually...now that I've looked at this, I think the right solution
> > here is to stop calling sock_release() on these sockets until we're in
> > xs_destroy. IOW, don't free the socket altogether unless we're truly
> > tearing down the connection for good. That means though that we need to
> > convert the places that currently call rpc_xprt_ops->close to call
> > something like a (new) ops->shutdown routine instead, with the
> > assumption of course that we can somehow reset and reuse the socket
> > afterward.
> > 
> > The question is how best to emulate the effect of ->close without
> > actually freeing the socket. Is it sufficient to call something like
> > kernel_sock_shutdown() on it, and twiddle all of the bits to ensure
> > that the connect_worker will get called afterward?
> 
> The reason why we close these sockets is that if the attempt at aborting
> the connection fails, then they typically end up in a TIME_WAIT state.
> 

I'm still trying to wade through the xprtsock.c socket handling code,
but it looks like we currently tear down the connection in 3 different
ways:

xs_close: which basically calls sock_release and gets rid of our
reference to an existing socket. Most of the places where we disconnect
the socket use this. After this, we end up with srcport == 0 which
makes it pick a new port.

xs_tcp_shutdown: which calls ->shutdown on it, but doesn't free
anything. This also preserves the existing srcport.

xs_abort_connection: calls kernel_connect to reconnect the socket to
AF_UNSPEC address (effectively disconnecting it?). This also preserves
the srcport. I guess we use this just before reconnecting when the
remote end drops the connection, since we don't need to be graceful
about tearing anything down at that point.

The last one actually does reuse the same socket, so my thinking was
that we could extend that scheme to the other cases. If we called
->shutdown on it and then reconnected it to AF_UNSPEC, would that
"reset" it back to a usable state?

If there really is no alternative to freeing the socket, then the only
real fix I can see is to set PF_MEMALLOC when we go to create it and
then reset it afterward. That's a pretty ugly fix though...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 19:28         ` Jeff Layton
@ 2012-06-27 19:37           ` Myklebust, Trond
  2012-06-27 20:18             ` Jeff Layton
  2012-06-27 19:56           ` Myklebust, Trond
  1 sibling, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-27 19:37 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs

T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBUaGUgcmVhc29uIHdoeSB3
ZSBjbG9zZSB0aGVzZSBzb2NrZXRzIGlzIHRoYXQgaWYgdGhlIGF0dGVtcHQgYXQgYWJvcnRpbmcN
Cj4gPiB0aGUgY29ubmVjdGlvbiBmYWlscywgdGhlbiB0aGV5IHR5cGljYWxseSBlbmQgdXAgaW4g
YSBUSU1FX1dBSVQgc3RhdGUuDQo+ID4gDQo+IA0KPiBJJ20gc3RpbGwgdHJ5aW5nIHRvIHdhZGUg
dGhyb3VnaCB0aGUgeHBydHNvY2suYyBzb2NrZXQgaGFuZGxpbmcgY29kZSwNCj4gYnV0IGl0IGxv
b2tzIGxpa2Ugd2UgY3VycmVudGx5IHRlYXIgZG93biB0aGUgY29ubmVjdGlvbiBpbiAzIGRpZmZl
cmVudA0KPiB3YXlzOg0KPiANCj4geHNfY2xvc2U6IHdoaWNoIGJhc2ljYWxseSBjYWxscyBzb2Nr
X3JlbGVhc2UgYW5kIGdldHMgcmlkIG9mIG91cg0KPiByZWZlcmVuY2UgdG8gYW4gZXhpc3Rpbmcg
c29ja2V0LiBNb3N0IG9mIHRoZSBwbGFjZXMgd2hlcmUgd2UgZGlzY29ubmVjdA0KPiB0aGUgc29j
a2V0IHVzZSB0aGlzLiBBZnRlciB0aGlzLCB3ZSBlbmQgdXAgd2l0aCBzcmNwb3J0ID09IDAgd2hp
Y2gNCj4gbWFrZXMgaXQgcGljayBhIG5ldyBwb3J0Lg0KPiANCj4geHNfdGNwX3NodXRkb3duOiB3
aGljaCBjYWxscyAtPnNodXRkb3duIG9uIGl0LCBidXQgZG9lc24ndCBmcmVlDQo+IGFueXRoaW5n
LiBUaGlzIGFsc28gcHJlc2VydmVzIHRoZSBleGlzdGluZyBzcmNwb3J0Lg0KPiANCj4geHNfYWJv
cnRfY29ubmVjdGlvbjogY2FsbHMga2VybmVsX2Nvbm5lY3QgdG8gcmVjb25uZWN0IHRoZSBzb2Nr
ZXQgdG8NCj4gQUZfVU5TUEVDIGFkZHJlc3MgKGVmZmVjdGl2ZWx5IGRpc2Nvbm5lY3RpbmcgaXQ/
KS4gVGhpcyBhbHNvIHByZXNlcnZlcw0KPiB0aGUgc3JjcG9ydC4gSSBndWVzcyB3ZSB1c2UgdGhp
cyBqdXN0IGJlZm9yZSByZWNvbm5lY3Rpbmcgd2hlbiB0aGUNCj4gcmVtb3RlIGVuZCBkcm9wcyB0
aGUgY29ubmVjdGlvbiwgc2luY2Ugd2UgZG9uJ3QgbmVlZCB0byBiZSBncmFjZWZ1bA0KPiBhYm91
dCB0ZWFyaW5nIGFueXRoaW5nIGRvd24gYXQgdGhhdCBwb2ludC4NCj4gDQo+IFRoZSBsYXN0IG9u
ZSBhY3R1YWxseSBkb2VzIHJldXNlIHRoZSBzYW1lIHNvY2tldCwgc28gbXkgdGhpbmtpbmcgd2Fz
DQo+IHRoYXQgd2UgY291bGQgZXh0ZW5kIHRoYXQgc2NoZW1lIHRvIHRoZSBvdGhlciBjYXNlcy4g
SWYgd2UgY2FsbGVkDQo+IC0+c2h1dGRvd24gb24gaXQgYW5kIHRoZW4gcmVjb25uZWN0ZWQgaXQg
dG8gQUZfVU5TUEVDLCB3b3VsZCB0aGF0DQo+ICJyZXNldCIgaXQgYmFjayB0byBhIHVzYWJsZSBz
dGF0ZT8NCg0KTm90IHRoYXQgSSdtIGF3YXJlIG9mLiBUaGUgcHJvYmxlbSBpcyB0aGF0IG1vc3Qg
b2YgdGhpcyBzdHVmZiBpcw0KdW5kb2N1bWVudGVkLiBGb3IgaW5zdGFuY2UsIHRoZSBBRl9VTlNQ
RUMgcmVjb25uZWN0IGlzIGRvY3VtZW50ZWQgb25seQ0KZm9yIFVEUCBjb25uZWN0aW9ucy4gV2hp
bGUgTGludXggaW1wbGVtZW50cyBpdCBmb3IgVENQIHRvbywgdGhlcmUgaXMgbm8NCnNwZWMgKHRo
YXQgSSdtIGF3YXJlIG9mKSB0aGF0IGV4cGxhaW5zIGhvdyB0aGF0IHNob3VsZCB3b3JrLg0KDQo+
IElmIHRoZXJlIHJlYWxseSBpcyBubyBhbHRlcm5hdGl2ZSB0byBmcmVlaW5nIHRoZSBzb2NrZXQs
IHRoZW4gdGhlIG9ubHkNCj4gcmVhbCBmaXggSSBjYW4gc2VlIGlzIHRvIHNldCBQRl9NRU1BTExP
QyB3aGVuIHdlIGdvIHRvIGNyZWF0ZSBpdCBhbmQNCj4gdGhlbiByZXNldCBpdCBhZnRlcndhcmQu
IFRoYXQncyBhIHByZXR0eSB1Z2x5IGZpeCB0aG91Z2guLi4NCg0KQWdyZWVkLi4uDQoNCi0tIA0K
VHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpU
cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 19:28         ` Jeff Layton
  2012-06-27 19:37           ` Myklebust, Trond
@ 2012-06-27 19:56           ` Myklebust, Trond
  2012-06-27 20:19             ` Jeff Layton
  1 sibling, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-27 19:56 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs

T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gSWYgdGhlcmUgcmVhbGx5IGlz
IG5vIGFsdGVybmF0aXZlIHRvIGZyZWVpbmcgdGhlIHNvY2tldCwgdGhlbiB0aGUgb25seQ0KPiBy
ZWFsIGZpeCBJIGNhbiBzZWUgaXMgdG8gc2V0IFBGX01FTUFMTE9DIHdoZW4gd2UgZ28gdG8gY3Jl
YXRlIGl0IGFuZA0KPiB0aGVuIHJlc2V0IGl0IGFmdGVyd2FyZC4gVGhhdCdzIGEgcHJldHR5IHVn
bHkgZml4IHRob3VnaC4uLg0KDQpJIGNhbiB0aGluayBvZiAyIHBvc3NpYmxlIGFsdGVybmF0aXZl
czoNCg0KICAgICAxLiBVc2UgdGhlIFBGX0ZTVFJBTlMgZmxhZyB0byB0ZWxsIG5mc19yZWxlYXNl
X3BhZ2UgdGhhdCB0aGlzIGlzIGENCiAgICAgICAgZGlyZWN0IHJlY2xhaW0gZnJvbSBycGNpb2Qu
DQogICAgIDIuIEFkZCBhICdjb25nZXN0ZWQnIGZsYWcgdG8gdGhlIHJwYyBjbGllbnQgdGhhdCBj
b3VsZCB0ZWxsIHJlY2xhaW0NCiAgICAgICAgdHlwZSBvcGVyYXRpb25zIHRvIGdpdmUgdXAgaWYg
YSBzb2NrZXQgYWxsb2NhdGlvbiBpcyBpbg0KICAgICAgICBwcm9ncmVzcy4NCg0KLS0gDQpUcm9u
ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25k
Lk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 19:37           ` Myklebust, Trond
@ 2012-06-27 20:18             ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2012-06-27 20:18 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs

On Wed, 27 Jun 2012 19:37:03 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote:
> > On Wed, 27 Jun 2012 18:43:56 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > The reason why we close these sockets is that if the attempt at aborting
> > > the connection fails, then they typically end up in a TIME_WAIT state.
> > > 
> > 
> > I'm still trying to wade through the xprtsock.c socket handling code,
> > but it looks like we currently tear down the connection in 3 different
> > ways:
> > 
> > xs_close: which basically calls sock_release and gets rid of our
> > reference to an existing socket. Most of the places where we disconnect
> > the socket use this. After this, we end up with srcport == 0 which
> > makes it pick a new port.
> > 
> > xs_tcp_shutdown: which calls ->shutdown on it, but doesn't free
> > anything. This also preserves the existing srcport.
> > 
> > xs_abort_connection: calls kernel_connect to reconnect the socket to
> > AF_UNSPEC address (effectively disconnecting it?). This also preserves
> > the srcport. I guess we use this just before reconnecting when the
> > remote end drops the connection, since we don't need to be graceful
> > about tearing anything down at that point.
> > 
> > The last one actually does reuse the same socket, so my thinking was
> > that we could extend that scheme to the other cases. If we called
> > ->shutdown on it and then reconnected it to AF_UNSPEC, would that
> > "reset" it back to a usable state?
> 
> Not that I'm aware of. The problem is that most of this stuff is
> undocumented. For instance, the AF_UNSPEC reconnect is documented only
> for UDP connections. While Linux implements it for TCP too, there is no
> spec (that I'm aware of) that explains how that should work.
> 

...and fwiw, it looks like reconnecting a TCP socket to an AF_UNSPEC
address doesn't work from userland -- you get back EINVAL. I have to
wonder if xs_abort_connection actually works as expected...

> > If there really is no alternative to freeing the socket, then the only
> > real fix I can see is to set PF_MEMALLOC when we go to create it and
> > then reset it afterward. That's a pretty ugly fix though...
> 
> Agreed...
> 

That looks basically like what Mel is doing to work around the problem,
though he only does it for xprt's that are tagged as being swapped
over. We could just make that unconditional, but the "congested" flag
scheme sounds better.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 19:56           ` Myklebust, Trond
@ 2012-06-27 20:19             ` Jeff Layton
  2012-06-27 20:46               ` Myklebust, Trond
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-27 20:19 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs

On Wed, 27 Jun 2012 19:56:38 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote:
> > On Wed, 27 Jun 2012 18:43:56 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > If there really is no alternative to freeing the socket, then the only
> > real fix I can see is to set PF_MEMALLOC when we go to create it and
> > then reset it afterward. That's a pretty ugly fix though...
> 
> I can think of 2 possible alternatives:
> 
>      1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a
>         direct reclaim from rpciod.

Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses
that flag now...

Suppose we're in some XFS allocation and then want to commit a NFS
page to satisfy it. There's no real reason we couldn't but that would
prevent it...

>      2. Add a 'congested' flag to the rpc client that could tell reclaim
>         type operations to give up if a socket allocation is in
>         progress.
> 

That's not a bad idea. That has the benefit of only skipping reclaim
when it's directly associated with the socket being reconnected. Seems
like that would have to hang off of the rpc_xprt, but we could do
something along those lines. Presumably Mel's swap-over-nfs patches
could be converted to do something similar instead of using PF_MEMALLOC
there too.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 20:19             ` Jeff Layton
@ 2012-06-27 20:46               ` Myklebust, Trond
  2012-06-28 13:19                 ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-27 20:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs

T24gV2VkLCAyMDEyLTA2LTI3IGF0IDE2OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gV2VkLCAyNyBKdW4gMjAxMiAxOTo1NjozOCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gV2VkLCAyMDEy
LTA2LTI3IGF0IDE1OjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFdlZCwg
MjcgSnVuIDIwMTIgMTg6NDM6NTYgKzAwMDANCj4gPiA+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJv
bmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gSWYgdGhlcmUgcmVhbGx5IGlz
IG5vIGFsdGVybmF0aXZlIHRvIGZyZWVpbmcgdGhlIHNvY2tldCwgdGhlbiB0aGUgb25seQ0KPiA+
ID4gcmVhbCBmaXggSSBjYW4gc2VlIGlzIHRvIHNldCBQRl9NRU1BTExPQyB3aGVuIHdlIGdvIHRv
IGNyZWF0ZSBpdCBhbmQNCj4gPiA+IHRoZW4gcmVzZXQgaXQgYWZ0ZXJ3YXJkLiBUaGF0J3MgYSBw
cmV0dHkgdWdseSBmaXggdGhvdWdoLi4uDQo+ID4gDQo+ID4gSSBjYW4gdGhpbmsgb2YgMiBwb3Nz
aWJsZSBhbHRlcm5hdGl2ZXM6DQo+ID4gDQo+ID4gICAgICAxLiBVc2UgdGhlIFBGX0ZTVFJBTlMg
ZmxhZyB0byB0ZWxsIG5mc19yZWxlYXNlX3BhZ2UgdGhhdCB0aGlzIGlzIGENCj4gPiAgICAgICAg
IGRpcmVjdCByZWNsYWltIGZyb20gcnBjaW9kLg0KPiANCj4gSG1tLi4udGhhdCdzIGEgYml0IGlu
ZGlzY3JpbWluYXRlLCBpc24ndCBpdD8gTG9va3MgbGlrZSBvbmx5IFhGUyB1c2VzDQo+IHRoYXQg
ZmxhZyBub3cuLi4NCj4gDQo+IFN1cHBvc2Ugd2UncmUgaW4gc29tZSBYRlMgYWxsb2NhdGlvbiBh
bmQgdGhlbiB3YW50IHRvIGNvbW1pdCBhIE5GUw0KPiBwYWdlIHRvIHNhdGlzZnkgaXQuIFRoZXJl
J3Mgbm8gcmVhbCByZWFzb24gd2UgY291bGRuJ3QgYnV0IHRoYXQgd291bGQNCj4gcHJldmVudCBp
dC4uLg0KDQpTbyBkbyBhbGwgdGhlIEdGUF9OT0ZTIGFsbG9jYXRpb25zLiBIb3cgaXMgdGhpcyBh
bnkgZGlmZmVyZW50Pw0KDQo+ID4gICAgICAyLiBBZGQgYSAnY29uZ2VzdGVkJyBmbGFnIHRvIHRo
ZSBycGMgY2xpZW50IHRoYXQgY291bGQgdGVsbCByZWNsYWltDQo+ID4gICAgICAgICB0eXBlIG9w
ZXJhdGlvbnMgdG8gZ2l2ZSB1cCBpZiBhIHNvY2tldCBhbGxvY2F0aW9uIGlzIGluDQo+ID4gICAg
ICAgICBwcm9ncmVzcy4NCj4gPiANCj4gDQo+IFRoYXQncyBub3QgYSBiYWQgaWRlYS4gVGhhdCBo
YXMgdGhlIGJlbmVmaXQgb2Ygb25seSBza2lwcGluZyByZWNsYWltDQo+IHdoZW4gaXQncyBkaXJl
Y3RseSBhc3NvY2lhdGVkIHdpdGggdGhlIHNvY2tldCBiZWluZyByZWNvbm5lY3RlZC4gU2VlbXMN
Cj4gbGlrZSB0aGF0IHdvdWxkIGhhdmUgdG8gaGFuZyBvZmYgb2YgdGhlIHJwY194cHJ0LCBidXQg
d2UgY291bGQgZG8NCj4gc29tZXRoaW5nIGFsb25nIHRob3NlIGxpbmVzLiBQcmVzdW1hYmx5IE1l
bCdzIHN3YXAtb3Zlci1uZnMgcGF0Y2hlcw0KPiBjb3VsZCBiZSBjb252ZXJ0ZWQgdG8gZG8gc29t
ZXRoaW5nIHNpbWlsYXIgaW5zdGVhZCBvZiB1c2luZyBQRl9NRU1BTExPQw0KPiB0aGVyZSB0b28u
DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0K
TmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-27 20:46               ` Myklebust, Trond
@ 2012-06-28 13:19                 ` Jeff Layton
  2012-06-28 13:40                   ` Myklebust, Trond
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-28 13:19 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman

On Wed, 27 Jun 2012 20:46:49 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2012-06-27 at 16:19 -0400, Jeff Layton wrote:
> > On Wed, 27 Jun 2012 19:56:38 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote:
> > > > On Wed, 27 Jun 2012 18:43:56 +0000
> > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > > If there really is no alternative to freeing the socket, then the only
> > > > real fix I can see is to set PF_MEMALLOC when we go to create it and
> > > > then reset it afterward. That's a pretty ugly fix though...
> > > 
> > > I can think of 2 possible alternatives:
> > > 
> > >      1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a
> > >         direct reclaim from rpciod.
> > 
> > Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses
> > that flag now...
> > 
> > Suppose we're in some XFS allocation and then want to commit a NFS
> > page to satisfy it. There's no real reason we couldn't but that would
> > prevent it...
> 
> So do all the GFP_NOFS allocations. How is this any different?
> 

Good point. Now that I've looked closer, I think this is probably the
best approach. Proposed patch below. It builds but is otherwise
untested so far...

> > >      2. Add a 'congested' flag to the rpc client that could tell reclaim
> > >         type operations to give up if a socket allocation is in
> > >         progress.
> > > 
> > 
> > That's not a bad idea. That has the benefit of only skipping reclaim
> > when it's directly associated with the socket being reconnected. Seems
> > like that would have to hang off of the rpc_xprt, but we could do
> > something along those lines. Presumably Mel's swap-over-nfs patches
> > could be converted to do something similar instead of using PF_MEMALLOC
> > there too.
> 

The problem with the above scheme is that it would be racy. It would
easily be possible for the congested flag to flip just after you
checked it but before issuing the commit.

Maybe we could push handling for that down into the commit call itself
-- allow it to give up if "congested" goes true while we're blocked and
waiting for the the reply, but that sounds complicated...

Anyway, here's a proposed patch for the PF_FSTRANS scheme. It builds
but I haven't otherwise tested it yet. For now, it just covers the
socket allocation and ->releasepage. Eventually I guess we'll probably
want to set and check for PF_FSTRANS in other places:

-----------------[snip]------------------

[RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

The problem is pretty clear. rpciod is trying to allocate memory for a new
socket to talk to the server. The VM ends up calling ->releasepage to get
more memory, and it tries to do a blocking commit. That commit can't
succeed however without a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a blocking commit call.

Also, cc'ing Mel since I borrowed his tsk_restore_flags helper here, and
this patch may cause merge conflicts with some of his Swap-over-NFS
patches.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/file.c                   |    8 ++++++--
 include/linux/sched.h           |    7 +++++++
 net/sunrpc/xprtrdma/transport.c |    4 +++-
 net/sunrpc/xprtsock.c           |   13 +++++++++++++
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a6708e6b..80cc621 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -463,8 +463,12 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
 		int how = FLUSH_SYNC;
 
-		/* Don't let kswapd deadlock waiting for OOM RPC calls */
-		if (current_is_kswapd())
+		/*
+		 * Don't let kswapd deadlock waiting for OOM RPC calls, and
+		 * don't recurse back into the fs if we know that we're trying
+		 * to reclaim memory for an fs-related allocation.
+		 */
+		if (current_is_kswapd() || current->flags & PF_FSTRANS)
 			how = 0;
 		nfs_commit_inode(mapping->host, how);
 	}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4059c0f..d71b523 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1889,6 +1889,13 @@ static inline void rcu_switch_from(struct task_struct *prev)
 
 #endif
 
+static inline void tsk_restore_flags(struct task_struct *p,
+			unsigned long pflags, unsigned long mask)
+{
+	p->flags &= ~mask;
+	p->flags |= pflags & mask;
+}
+
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index b446e10..7b0f74b 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -197,9 +197,11 @@ xprt_rdma_connect_worker(struct work_struct *work)
 	struct rpcrdma_xprt *r_xprt =
 		container_of(work, struct rpcrdma_xprt, rdma_connect.work);
 	struct rpc_xprt *xprt = &r_xprt->xprt;
+	unsigned long pflags = current->flags;
 	int rc = 0;
 
 	if (!xprt->shutdown) {
+		current->flags |= PF_FSTRANS;
 		xprt_clear_connected(xprt);
 
 		dprintk("RPC:       %s: %sconnect\n", __func__,
@@ -212,10 +214,10 @@ xprt_rdma_connect_worker(struct work_struct *work)
 
 out:
 	xprt_wake_pending_tasks(xprt, rc);
-
 out_clear:
 	dprintk("RPC:       %s: exit\n", __func__);
 	xprt_clear_connecting(xprt);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..6c7d8d5 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1890,11 +1890,14 @@ static void xs_local_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 	status = __sock_create(xprt->xprt_net, AF_LOCAL,
 					SOCK_STREAM, 0, &sock, 1);
@@ -1928,6 +1931,7 @@ static void xs_local_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
@@ -1965,11 +1969,14 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1988,6 +1995,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /*
@@ -2108,11 +2116,14 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2162,6 +2173,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	case -EINPROGRESS:
 	case -EALREADY:
 		xprt_clear_connecting(xprt);
+		tsk_restore_flags(current, pflags, PF_FSTRANS);
 		return;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
@@ -2174,6 +2186,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /**
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-28 13:19                 ` Jeff Layton
@ 2012-06-28 13:40                   ` Myklebust, Trond
  2012-06-28 14:47                     ` Jeff Layton
  2012-06-28 14:51                     ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton
  0 siblings, 2 replies; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-28 13:40 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Harshula, linux-nfs, mgorman

T24gVGh1LCAyMDEyLTA2LTI4IGF0IDA5OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gV2VkLCAyNyBKdW4gMjAxMiAyMDo0Njo0OSArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gV2VkLCAyMDEy
LTA2LTI3IGF0IDE2OjE5IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFdlZCwg
MjcgSnVuIDIwMTIgMTk6NTY6MzggKzAwMDANCj4gPiA+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJv
bmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gDQo+ID4gPiA+IE9uIFdlZCwg
MjAxMi0wNi0yNyBhdCAxNToyOCAtMDQwMCwgSmVmZiBMYXl0b24gd3JvdGU6DQo+ID4gPiA+ID4g
T24gV2VkLCAyNyBKdW4gMjAxMiAxODo0Mzo1NiArMDAwMA0KPiA+ID4gPiA+ICJNeWtsZWJ1c3Qs
IFRyb25kIiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gPiA+IElm
IHRoZXJlIHJlYWxseSBpcyBubyBhbHRlcm5hdGl2ZSB0byBmcmVlaW5nIHRoZSBzb2NrZXQsIHRo
ZW4gdGhlIG9ubHkNCj4gPiA+ID4gPiByZWFsIGZpeCBJIGNhbiBzZWUgaXMgdG8gc2V0IFBGX01F
TUFMTE9DIHdoZW4gd2UgZ28gdG8gY3JlYXRlIGl0IGFuZA0KPiA+ID4gPiA+IHRoZW4gcmVzZXQg
aXQgYWZ0ZXJ3YXJkLiBUaGF0J3MgYSBwcmV0dHkgdWdseSBmaXggdGhvdWdoLi4uDQo+ID4gPiA+
IA0KPiA+ID4gPiBJIGNhbiB0aGluayBvZiAyIHBvc3NpYmxlIGFsdGVybmF0aXZlczoNCj4gPiA+
ID4gDQo+ID4gPiA+ICAgICAgMS4gVXNlIHRoZSBQRl9GU1RSQU5TIGZsYWcgdG8gdGVsbCBuZnNf
cmVsZWFzZV9wYWdlIHRoYXQgdGhpcyBpcyBhDQo+ID4gPiA+ICAgICAgICAgZGlyZWN0IHJlY2xh
aW0gZnJvbSBycGNpb2QuDQo+ID4gPiANCj4gPiA+IEhtbS4uLnRoYXQncyBhIGJpdCBpbmRpc2Ny
aW1pbmF0ZSwgaXNuJ3QgaXQ/IExvb2tzIGxpa2Ugb25seSBYRlMgdXNlcw0KPiA+ID4gdGhhdCBm
bGFnIG5vdy4uLg0KPiA+ID4gDQo+ID4gPiBTdXBwb3NlIHdlJ3JlIGluIHNvbWUgWEZTIGFsbG9j
YXRpb24gYW5kIHRoZW4gd2FudCB0byBjb21taXQgYSBORlMNCj4gPiA+IHBhZ2UgdG8gc2F0aXNm
eSBpdC4gVGhlcmUncyBubyByZWFsIHJlYXNvbiB3ZSBjb3VsZG4ndCBidXQgdGhhdCB3b3VsZA0K
PiA+ID4gcHJldmVudCBpdC4uLg0KPiA+IA0KPiA+IFNvIGRvIGFsbCB0aGUgR0ZQX05PRlMgYWxs
b2NhdGlvbnMuIEhvdyBpcyB0aGlzIGFueSBkaWZmZXJlbnQ/DQo+ID4gDQo+IA0KPiBHb29kIHBv
aW50LiBOb3cgdGhhdCBJJ3ZlIGxvb2tlZCBjbG9zZXIsIEkgdGhpbmsgdGhpcyBpcyBwcm9iYWJs
eSB0aGUNCj4gYmVzdCBhcHByb2FjaC4gUHJvcG9zZWQgcGF0Y2ggYmVsb3cuIEl0IGJ1aWxkcyBi
dXQgaXMgb3RoZXJ3aXNlDQo+IHVudGVzdGVkIHNvIGZhci4uLg0KPiANCj4gPiA+ID4gICAgICAy
LiBBZGQgYSAnY29uZ2VzdGVkJyBmbGFnIHRvIHRoZSBycGMgY2xpZW50IHRoYXQgY291bGQgdGVs
bCByZWNsYWltDQo+ID4gPiA+ICAgICAgICAgdHlwZSBvcGVyYXRpb25zIHRvIGdpdmUgdXAgaWYg
YSBzb2NrZXQgYWxsb2NhdGlvbiBpcyBpbg0KPiA+ID4gPiAgICAgICAgIHByb2dyZXNzLg0KPiA+
ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhhdCdzIG5vdCBhIGJhZCBpZGVhLiBUaGF0IGhhcyB0aGUg
YmVuZWZpdCBvZiBvbmx5IHNraXBwaW5nIHJlY2xhaW0NCj4gPiA+IHdoZW4gaXQncyBkaXJlY3Rs
eSBhc3NvY2lhdGVkIHdpdGggdGhlIHNvY2tldCBiZWluZyByZWNvbm5lY3RlZC4gU2VlbXMNCj4g
PiA+IGxpa2UgdGhhdCB3b3VsZCBoYXZlIHRvIGhhbmcgb2ZmIG9mIHRoZSBycGNfeHBydCwgYnV0
IHdlIGNvdWxkIGRvDQo+ID4gPiBzb21ldGhpbmcgYWxvbmcgdGhvc2UgbGluZXMuIFByZXN1bWFi
bHkgTWVsJ3Mgc3dhcC1vdmVyLW5mcyBwYXRjaGVzDQo+ID4gPiBjb3VsZCBiZSBjb252ZXJ0ZWQg
dG8gZG8gc29tZXRoaW5nIHNpbWlsYXIgaW5zdGVhZCBvZiB1c2luZyBQRl9NRU1BTExPQw0KPiA+
ID4gdGhlcmUgdG9vLg0KPiA+IA0KPiANCj4gVGhlIHByb2JsZW0gd2l0aCB0aGUgYWJvdmUgc2No
ZW1lIGlzIHRoYXQgaXQgd291bGQgYmUgcmFjeS4gSXQgd291bGQNCj4gZWFzaWx5IGJlIHBvc3Np
YmxlIGZvciB0aGUgY29uZ2VzdGVkIGZsYWcgdG8gZmxpcCBqdXN0IGFmdGVyIHlvdQ0KPiBjaGVj
a2VkIGl0IGJ1dCBiZWZvcmUgaXNzdWluZyB0aGUgY29tbWl0Lg0KPiANCj4gTWF5YmUgd2UgY291
bGQgcHVzaCBoYW5kbGluZyBmb3IgdGhhdCBkb3duIGludG8gdGhlIGNvbW1pdCBjYWxsIGl0c2Vs
Zg0KPiAtLSBhbGxvdyBpdCB0byBnaXZlIHVwIGlmICJjb25nZXN0ZWQiIGdvZXMgdHJ1ZSB3aGls
ZSB3ZSdyZSBibG9ja2VkIGFuZA0KPiB3YWl0aW5nIGZvciB0aGUgdGhlIHJlcGx5LCBidXQgdGhh
dCBzb3VuZHMgY29tcGxpY2F0ZWQuLi4NCj4gDQo+IEFueXdheSwgaGVyZSdzIGEgcHJvcG9zZWQg
cGF0Y2ggZm9yIHRoZSBQRl9GU1RSQU5TIHNjaGVtZS4gSXQgYnVpbGRzDQo+IGJ1dCBJIGhhdmVu
J3Qgb3RoZXJ3aXNlIHRlc3RlZCBpdCB5ZXQuIEZvciBub3csIGl0IGp1c3QgY292ZXJzIHRoZQ0K
PiBzb2NrZXQgYWxsb2NhdGlvbiBhbmQgLT5yZWxlYXNlcGFnZS4gRXZlbnR1YWxseSBJIGd1ZXNz
IHdlJ2xsIHByb2JhYmx5DQo+IHdhbnQgdG8gc2V0IGFuZCBjaGVjayBmb3IgUEZfRlNUUkFOUyBp
biBvdGhlciBwbGFjZXM6DQoNCklzIHRoZXJlIGFueSByZWFzb24gd2h5IHdlIG1pZ2h0IG5vdCB3
YW50IGFsbCBycGNpb2Qgd29yayB0byBydW4gdW5kZXINClBGX0ZTVFJBTlM/IElPVzogY291bGRu
J3Qgd2UganVzdCBzZXQvY2xlYXIgaXQgdW5jb25kaXRpb25hbGx5IHdoZW4NCmVudGVyaW5nL2V4
aXRpbmcgcnBjX2FzeW5jX3NjaGVkdWxlKCksIGFuZCB4c18qX3NldHVwX3NvY2tldCgpPw0KDQpX
ZSBzaG91bGRuJ3QgbmVlZCB0byB3b3JyeSBhYm91dCB0aGUgcmVtYWluaW5nIHJwY2lvZCB3b3Jr
IHN0cnVjdHVyZXMgaW4NCnJwY190aW1lb3V0X3VwY2FsbF9xdWV1ZSBhbmQgeHBydF9hdXRvY2xv
c2UsIHNpbmNlIHRob3NlIGRvbid0IGV2ZXIgbmVlZA0KdG8gZG8gYW55IGFsbG9jYXRpb25zLg0K
DQo+IC0tLS0tLS0tLS0tLS0tLS0tW3NuaXBdLS0tLS0tLS0tLS0tLS0tLS0tDQo+IA0KPiBbUkZD
IFBBVENIXSBuZnM6IGRvIG5vbi1ibG9ja2luZyBjb21taXQgaW4gcmVsZWFzZXBhZ2UgaWYgd2Un
cmUgYXR0ZW1wdGluZyB0byByZWNvbm5lY3QgdGhlIHNvY2tldA0KPiANCj4gV2UndmUgaGFkIHNv
bWUgcmVwb3J0cyBvZiBhIGRlYWRsb2NrIHdoZXJlIHJwY2lvZCBlbmRzIHVwIHdpdGggYSBzdGFj
aw0KPiB0cmFjZSBsaWtlIHRoaXM6DQo+IA0KPiAgICAgUElEOiAyNTA3ICAgVEFTSzogZmZmZjg4
MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCINCj4gICAgICAjMCBbZmZm
Zjg4MTAzNDNiZjJmMF0gc2NoZWR1bGUgYXQgZmZmZmZmZmY4MTRkYWJkOQ0KPiAgICAgICMxIFtm
ZmZmODgxMDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZmZmZmZmZhMDM4ZmMw
NCBbbmZzXQ0KPiAgICAgICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRfb25fYml0IGF0IGZm
ZmZmZmZmODE0ZGJjMmYNCj4gICAgICAjMyBbZmZmZjg4MTAzNDNiZjQxOF0gb3V0X29mX2xpbmVf
d2FpdF9vbl9iaXQgYXQgZmZmZmZmZmY4MTRkYmNkOA0KPiAgICAgICM0IFtmZmZmODgxMDM0M2Jm
NDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25mc10NCj4gICAgICAj
NSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZmZmZmZmEwMzhiZWY2
IFtuZnNdDQo+ICAgICAgIzYgW2ZmZmY4ODEwMzQzYmY1MjhdIHRyeV90b19yZWxlYXNlX3BhZ2Ug
YXQgZmZmZmZmZmY4MTEwYzY3MA0KPiAgICAgICM3IFtmZmZmODgxMDM0M2JmNTM4XSBzaHJpbmtf
cGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiAgICAgICM4IFtmZmZmODgx
MDM0M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgxMTI2NjM4DQo+ICAg
ICAgIzkgW2ZmZmY4ODEwMzQzYmY4MThdIHNocmlua196b25lIGF0IGZmZmZmZmZmODExMjc4OGYN
Cj4gICAgICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVfcGFnZXMgYXQgZmZm
ZmZmZmY4MTEyN2IxZQ0KPiAgICAgIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0cnlfdG9fZnJlZV9w
YWdlcyBhdCBmZmZmZmZmZjgxMTI4MTJmDQo+ICAgICAjMTIgW2ZmZmY4ODEwMzQzYmZhMDhdIF9f
YWxsb2NfcGFnZXNfbm9kZW1hc2sgYXQgZmZmZmZmZmY4MTExZmRhZA0KPiAgICAgIzEzIFtmZmZm
ODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZmODExNTk5NDINCj4gICAgICMx
NCBbZmZmZjg4MTAzNDNiZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YTU1YQ0K
PiAgICAgIzE1IFtmZmZmODgxMDM0M2JmYmQ4XSBfX19fY2FjaGVfYWxsb2Nfbm9kZSBhdCBmZmZm
ZmZmZjgxMTVhMmQ5DQo+ICAgICAjMTYgW2ZmZmY4ODEwMzQzYmZjMzhdIGttZW1fY2FjaGVfYWxs
b2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiAgICAgIzE3IFtmZmZmODgxMDM0M2JmYzc4XSBza19w
cm90X2FsbG9jIGF0IGZmZmZmZmZmODE0MTE4MDgNCj4gICAgICMxOCBbZmZmZjg4MTAzNDNiZmNi
OF0gc2tfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTk3Yw0KPiAgICAgIzE5IFtmZmZmODgxMDM0M2Jm
Y2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZmZjgxNDgzYmE2DQo+ICAgICAjMjAgW2ZmZmY4ODEw
MzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQgZmZmZmZmZmY4MTQwYjRhNw0KPiAgICAgIzIxIFtm
ZmZmODgxMDM0M2JmZDk4XSB4c19jcmVhdGVfc29jayBhdCBmZmZmZmZmZmEwMWY2NDliIFtzdW5y
cGNdDQo+ICAgICAjMjIgW2ZmZmY4ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1cF9zb2NrZXQgYXQg
ZmZmZmZmZmZhMDFmNjk2NSBbc3VucnBjXQ0KPiAgICAgIzIzIFtmZmZmODgxMDM0M2JmZTM4XSB3
b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDANCj4gICAgICMyNCBbZmZmZjg4MTAzNDNi
ZmVlOF0ga3RocmVhZCBhdCBmZmZmZmZmZjgxMDhkZDk2DQo+ICAgICAjMjUgW2ZmZmY4ODEwMzQz
YmZmNDhdIGtlcm5lbF90aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiANCj4gVGhlIHByb2Js
ZW0gaXMgcHJldHR5IGNsZWFyLiBycGNpb2QgaXMgdHJ5aW5nIHRvIGFsbG9jYXRlIG1lbW9yeSBm
b3IgYSBuZXcNCj4gc29ja2V0IHRvIHRhbGsgdG8gdGhlIHNlcnZlci4gVGhlIFZNIGVuZHMgdXAg
Y2FsbGluZyAtPnJlbGVhc2VwYWdlIHRvIGdldA0KPiBtb3JlIG1lbW9yeSwgYW5kIGl0IHRyaWVz
IHRvIGRvIGEgYmxvY2tpbmcgY29tbWl0LiBUaGF0IGNvbW1pdCBjYW4ndA0KPiBzdWNjZWVkIGhv
d2V2ZXIgd2l0aG91dCBhIGNvbm5lY3RlZCBzb2NrZXQsIHNvIHdlIGRlYWRsb2NrLg0KPiANCj4g
Rml4IHRoaXMgYnkgc2V0dGluZyBQRl9GU1RSQU5TIG9uIHRoZSB3b3JrcXVldWUgdGFzayBwcmlv
ciB0byBkb2luZyB0aGUNCj4gc29ja2V0IGFsbG9jYXRpb24sIGFuZCBoYXZpbmcgbmZzX3JlbGVh
c2VfcGFnZSBjaGVjayBmb3IgdGhhdCBmbGFnIHdoZW4NCj4gZGVjaWRpbmcgd2hldGhlciB0byBk
byBhIGJsb2NraW5nIGNvbW1pdCBjYWxsLg0KPiANCj4gQWxzbywgY2MnaW5nIE1lbCBzaW5jZSBJ
IGJvcnJvd2VkIGhpcyB0c2tfcmVzdG9yZV9mbGFncyBoZWxwZXIgaGVyZSwgYW5kDQo+IHRoaXMg
cGF0Y2ggbWF5IGNhdXNlIG1lcmdlIGNvbmZsaWN0cyB3aXRoIHNvbWUgb2YgaGlzIFN3YXAtb3Zl
ci1ORlMNCj4gcGF0Y2hlcy4NCj4gDQo+IENjOiBNZWwgR29ybWFuIDxtZ29ybWFuQHN1c2UuZGU+
DQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhhdC5jb20+DQo+IC0t
LQ0KPiAgZnMvbmZzL2ZpbGUuYyAgICAgICAgICAgICAgICAgICB8ICAgIDggKysrKysrLS0NCj4g
IGluY2x1ZGUvbGludXgvc2NoZWQuaCAgICAgICAgICAgfCAgICA3ICsrKysrKysNCj4gIG5ldC9z
dW5ycGMveHBydHJkbWEvdHJhbnNwb3J0LmMgfCAgICA0ICsrKy0NCj4gIG5ldC9zdW5ycGMveHBy
dHNvY2suYyAgICAgICAgICAgfCAgIDEzICsrKysrKysrKysrKysNCj4gIDQgZmlsZXMgY2hhbmdl
ZCwgMjkgaW5zZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9m
cy9uZnMvZmlsZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiBpbmRleCBhNjcwOGU2Yi4uODBjYzYyMSAx
MDA2NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmlsZS5jDQo+IEBA
IC00NjMsOCArNDYzLDEyIEBAIHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShzdHJ1Y3QgcGFn
ZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiAgCWlmIChtYXBwaW5nICYmIChnZnAgJiBHRlBfS0VSTkVM
KSA9PSBHRlBfS0VSTkVMKSB7DQo+ICAJCWludCBob3cgPSBGTFVTSF9TWU5DOw0KPiAgDQo+IC0J
CS8qIERvbid0IGxldCBrc3dhcGQgZGVhZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQQyBjYWxscyAq
Lw0KPiAtCQlpZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gKwkJLyoNCj4gKwkJICogRG9uJ3Qg
bGV0IGtzd2FwZCBkZWFkbG9jayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzLCBhbmQNCj4gKwkJ
ICogZG9uJ3QgcmVjdXJzZSBiYWNrIGludG8gdGhlIGZzIGlmIHdlIGtub3cgdGhhdCB3ZSdyZSB0
cnlpbmcNCj4gKwkJICogdG8gcmVjbGFpbSBtZW1vcnkgZm9yIGFuIGZzLXJlbGF0ZWQgYWxsb2Nh
dGlvbi4NCj4gKwkJICovDQo+ICsJCWlmIChjdXJyZW50X2lzX2tzd2FwZCgpIHx8IGN1cnJlbnQt
PmZsYWdzICYgUEZfRlNUUkFOUykNCj4gIAkJCWhvdyA9IDA7DQo+ICAJCW5mc19jb21taXRfaW5v
ZGUobWFwcGluZy0+aG9zdCwgaG93KTsNCj4gIAl9DQo+IGRpZmYgLS1naXQgYS9pbmNsdWRlL2xp
bnV4L3NjaGVkLmggYi9pbmNsdWRlL2xpbnV4L3NjaGVkLmgNCj4gaW5kZXggNDA1OWMwZi4uZDcx
YjUyMyAxMDA2NDQNCj4gLS0tIGEvaW5jbHVkZS9saW51eC9zY2hlZC5oDQo+ICsrKyBiL2luY2x1
ZGUvbGludXgvc2NoZWQuaA0KPiBAQCAtMTg4OSw2ICsxODg5LDEzIEBAIHN0YXRpYyBpbmxpbmUg
dm9pZCByY3Vfc3dpdGNoX2Zyb20oc3RydWN0IHRhc2tfc3RydWN0ICpwcmV2KQ0KPiAgDQo+ICAj
ZW5kaWYNCj4gIA0KPiArc3RhdGljIGlubGluZSB2b2lkIHRza19yZXN0b3JlX2ZsYWdzKHN0cnVj
dCB0YXNrX3N0cnVjdCAqcCwNCj4gKwkJCXVuc2lnbmVkIGxvbmcgcGZsYWdzLCB1bnNpZ25lZCBs
b25nIG1hc2spDQo+ICt7DQo+ICsJcC0+ZmxhZ3MgJj0gfm1hc2s7DQo+ICsJcC0+ZmxhZ3MgfD0g
cGZsYWdzICYgbWFzazsNCj4gK30NCj4gKw0KPiAgI2lmZGVmIENPTkZJR19TTVANCj4gIGV4dGVy
biB2b2lkIGRvX3NldF9jcHVzX2FsbG93ZWQoc3RydWN0IHRhc2tfc3RydWN0ICpwLA0KPiAgCQkJ
ICAgICAgIGNvbnN0IHN0cnVjdCBjcHVtYXNrICpuZXdfbWFzayk7DQo+IGRpZmYgLS1naXQgYS9u
ZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jIGIvbmV0L3N1bnJwYy94cHJ0cmRtYS90cmFu
c3BvcnQuYw0KPiBpbmRleCBiNDQ2ZTEwLi43YjBmNzRiIDEwMDY0NA0KPiAtLS0gYS9uZXQvc3Vu
cnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5jDQo+ICsrKyBiL25ldC9zdW5ycGMveHBydHJkbWEvdHJh
bnNwb3J0LmMNCj4gQEAgLTE5Nyw5ICsxOTcsMTEgQEAgeHBydF9yZG1hX2Nvbm5lY3Rfd29ya2Vy
KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIAlzdHJ1Y3QgcnBjcmRtYV94cHJ0ICpyX3hw
cnQgPQ0KPiAgCQljb250YWluZXJfb2Yod29yaywgc3RydWN0IHJwY3JkbWFfeHBydCwgcmRtYV9j
b25uZWN0LndvcmspOw0KPiAgCXN0cnVjdCBycGNfeHBydCAqeHBydCA9ICZyX3hwcnQtPnhwcnQ7
DQo+ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlpbnQgcmMg
PSAwOw0KPiAgDQo+ICAJaWYgKCF4cHJ0LT5zaHV0ZG93bikgew0KPiArCQljdXJyZW50LT5mbGFn
cyB8PSBQRl9GU1RSQU5TOw0KPiAgCQl4cHJ0X2NsZWFyX2Nvbm5lY3RlZCh4cHJ0KTsNCj4gIA0K
PiAgCQlkcHJpbnRrKCJSUEM6ICAgICAgICVzOiAlc2Nvbm5lY3RcbiIsIF9fZnVuY19fLA0KPiBA
QCAtMjEyLDEwICsyMTQsMTAgQEAgeHBydF9yZG1hX2Nvbm5lY3Rfd29ya2VyKHN0cnVjdCB3b3Jr
X3N0cnVjdCAqd29yaykNCj4gIA0KPiAgb3V0Og0KPiAgCXhwcnRfd2FrZV9wZW5kaW5nX3Rhc2tz
KHhwcnQsIHJjKTsNCj4gLQ0KPiAgb3V0X2NsZWFyOg0KPiAgCWRwcmludGsoIlJQQzogICAgICAg
JXM6IGV4aXRcbiIsIF9fZnVuY19fKTsNCj4gIAl4cHJ0X2NsZWFyX2Nvbm5lY3RpbmcoeHBydCk7
DQo+ICsJdHNrX3Jlc3RvcmVfZmxhZ3MoY3VycmVudCwgcGZsYWdzLCBQRl9GU1RSQU5TKTsNCj4g
IH0NCj4gIA0KPiAgLyoNCj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydHNvY2suYyBiL25l
dC9zdW5ycGMveHBydHNvY2suYw0KPiBpbmRleCA4OTBiMDNmLi42YzdkOGQ1IDEwMDY0NA0KPiAt
LS0gYS9uZXQvc3VucnBjL3hwcnRzb2NrLmMNCj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0c29jay5j
DQo+IEBAIC0xODkwLDExICsxODkwLDE0IEBAIHN0YXRpYyB2b2lkIHhzX2xvY2FsX3NldHVwX3Nv
Y2tldChzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICAJCWNvbnRhaW5lcl9vZih3b3JrLCBz
dHJ1Y3Qgc29ja194cHJ0LCBjb25uZWN0X3dvcmtlci53b3JrKTsNCj4gIAlzdHJ1Y3QgcnBjX3hw
cnQgKnhwcnQgPSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2s7DQo+
ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlpbnQgc3RhdHVz
ID0gLUVJTzsNCj4gIA0KPiAgCWlmICh4cHJ0LT5zaHV0ZG93bikNCj4gIAkJZ290byBvdXQ7DQo+
ICANCj4gKwljdXJyZW50LT5mbGFncyB8PSBQRl9GU1RSQU5TOw0KPiArDQo+ICAJY2xlYXJfYml0
KFhQUlRfQ09OTkVDVElPTl9BQk9SVCwgJnhwcnQtPnN0YXRlKTsNCj4gIAlzdGF0dXMgPSBfX3Nv
Y2tfY3JlYXRlKHhwcnQtPnhwcnRfbmV0LCBBRl9MT0NBTCwNCj4gIAkJCQkJU09DS19TVFJFQU0s
IDAsICZzb2NrLCAxKTsNCj4gQEAgLTE5MjgsNiArMTkzMSw3IEBAIHN0YXRpYyB2b2lkIHhzX2xv
Y2FsX3NldHVwX3NvY2tldChzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICBvdXQ6DQo+ICAJ
eHBydF9jbGVhcl9jb25uZWN0aW5nKHhwcnQpOw0KPiAgCXhwcnRfd2FrZV9wZW5kaW5nX3Rhc2tz
KHhwcnQsIHN0YXR1cyk7DQo+ICsJdHNrX3Jlc3RvcmVfZmxhZ3MoY3VycmVudCwgcGZsYWdzLCBQ
Rl9GU1RSQU5TKTsNCj4gIH0NCj4gIA0KPiAgc3RhdGljIHZvaWQgeHNfdWRwX2ZpbmlzaF9jb25u
ZWN0aW5nKHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHNvY2tldCAqc29jaykNCj4gQEAg
LTE5NjUsMTEgKzE5NjksMTQgQEAgc3RhdGljIHZvaWQgeHNfdWRwX3NldHVwX3NvY2tldChzdHJ1
Y3Qgd29ya19zdHJ1Y3QgKndvcmspDQo+ICAJCWNvbnRhaW5lcl9vZih3b3JrLCBzdHJ1Y3Qgc29j
a194cHJ0LCBjb25uZWN0X3dvcmtlci53b3JrKTsNCj4gIAlzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQg
PSAmdHJhbnNwb3J0LT54cHJ0Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2sgPSB0cmFuc3BvcnQt
PnNvY2s7DQo+ICsJdW5zaWduZWQgbG9uZyBwZmxhZ3MgPSBjdXJyZW50LT5mbGFnczsNCj4gIAlp
bnQgc3RhdHVzID0gLUVJTzsNCj4gIA0KPiAgCWlmICh4cHJ0LT5zaHV0ZG93bikNCj4gIAkJZ290
byBvdXQ7DQo+ICANCj4gKwljdXJyZW50LT5mbGFncyB8PSBQRl9GU1RSQU5TOw0KPiArDQo+ICAJ
LyogU3RhcnQgYnkgcmVzZXR0aW5nIGFueSBleGlzdGluZyBzdGF0ZSAqLw0KPiAgCXhzX3Jlc2V0
X3RyYW5zcG9ydCh0cmFuc3BvcnQpOw0KPiAgCXNvY2sgPSB4c19jcmVhdGVfc29jayh4cHJ0LCB0
cmFuc3BvcnQsDQo+IEBAIC0xOTg4LDYgKzE5OTUsNyBAQCBzdGF0aWMgdm9pZCB4c191ZHBfc2V0
dXBfc29ja2V0KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIG91dDoNCj4gIAl4cHJ0X2Ns
ZWFyX2Nvbm5lY3RpbmcoeHBydCk7DQo+ICAJeHBydF93YWtlX3BlbmRpbmdfdGFza3MoeHBydCwg
c3RhdHVzKTsNCj4gKwl0c2tfcmVzdG9yZV9mbGFncyhjdXJyZW50LCBwZmxhZ3MsIFBGX0ZTVFJB
TlMpOw0KPiAgfQ0KPiAgDQo+ICAvKg0KPiBAQCAtMjEwOCwxMSArMjExNiwxNCBAQCBzdGF0aWMg
dm9pZCB4c190Y3Bfc2V0dXBfc29ja2V0KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCj4gIAkJ
Y29udGFpbmVyX29mKHdvcmssIHN0cnVjdCBzb2NrX3hwcnQsIGNvbm5lY3Rfd29ya2VyLndvcmsp
Ow0KPiAgCXN0cnVjdCBzb2NrZXQgKnNvY2sgPSB0cmFuc3BvcnQtPnNvY2s7DQo+ICAJc3RydWN0
IHJwY194cHJ0ICp4cHJ0ID0gJnRyYW5zcG9ydC0+eHBydDsNCj4gKwl1bnNpZ25lZCBsb25nIHBm
bGFncyA9IGN1cnJlbnQtPmZsYWdzOw0KPiAgCWludCBzdGF0dXMgPSAtRUlPOw0KPiAgDQo+ICAJ
aWYgKHhwcnQtPnNodXRkb3duKQ0KPiAgCQlnb3RvIG91dDsNCj4gIA0KPiArCWN1cnJlbnQtPmZs
YWdzIHw9IFBGX0ZTVFJBTlM7DQo+ICsNCj4gIAlpZiAoIXNvY2spIHsNCj4gIAkJY2xlYXJfYml0
KFhQUlRfQ09OTkVDVElPTl9BQk9SVCwgJnhwcnQtPnN0YXRlKTsNCj4gIAkJc29jayA9IHhzX2Ny
ZWF0ZV9zb2NrKHhwcnQsIHRyYW5zcG9ydCwNCj4gQEAgLTIxNjIsNiArMjE3Myw3IEBAIHN0YXRp
YyB2b2lkIHhzX3RjcF9zZXR1cF9zb2NrZXQoc3RydWN0IHdvcmtfc3RydWN0ICp3b3JrKQ0KPiAg
CWNhc2UgLUVJTlBST0dSRVNTOg0KPiAgCWNhc2UgLUVBTFJFQURZOg0KPiAgCQl4cHJ0X2NsZWFy
X2Nvbm5lY3RpbmcoeHBydCk7DQo+ICsJCXRza19yZXN0b3JlX2ZsYWdzKGN1cnJlbnQsIHBmbGFn
cywgUEZfRlNUUkFOUyk7DQo+ICAJCXJldHVybjsNCj4gIAljYXNlIC1FSU5WQUw6DQo+ICAJCS8q
IEhhcHBlbnMsIGZvciBpbnN0YW5jZSwgaWYgdGhlIHVzZXIgc3BlY2lmaWVkIGEgbGluaw0KPiBA
QCAtMjE3NCw2ICsyMTg2LDcgQEAgb3V0X2VhZ2FpbjoNCj4gIG91dDoNCj4gIAl4cHJ0X2NsZWFy
X2Nvbm5lY3RpbmcoeHBydCk7DQo+ICAJeHBydF93YWtlX3BlbmRpbmdfdGFza3MoeHBydCwgc3Rh
dHVzKTsNCj4gKwl0c2tfcmVzdG9yZV9mbGFncyhjdXJyZW50LCBwZmxhZ3MsIFBGX0ZTVFJBTlMp
Ow0KPiAgfQ0KPiAgDQo+ICAvKioNCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBj
bGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3
d3cubmV0YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-20 14:14     ` Jeff Layton
@ 2012-06-28 14:31       ` Mel Gorman
  2012-06-28 14:55         ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Mel Gorman @ 2012-06-28 14:31 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Myklebust, Trond, Harshula, linux-nfs

I am not on the linux-nfs list so I lack context about this issue so
take all this with a grain of salt.

On Wed, Jun 20, 2012 at 07:14:22AM -0700, Jeff Layton wrote:
> On Fri, 15 Jun 2012 21:25:10 +0000
> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> 
> > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote:
> > > On Fri, 15 Jun 2012 22:54:10 +1000
> > > Harshula <harshula@redhat.com> wrote:
> > > 
> > > > Hi All,
> > > > 
> > > > Can I please get your feedback on the following?
> > > > 
> > > > rpciod is blocked:
> > > > ------------------------------------------------------------
> > > > crash> bt  2507 
> > > > PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
> > > >  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
> > > >  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
> > > >  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
> > > >  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
> > > >  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
> > > >  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
> > > >  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
> > > >  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
> > > >  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
> > > >  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> > > > ------------------------------------------------------------
> > > > 
> > > > nfs_release_page() gives kswapd process an exemption from being blocked.
> > > > Should we do the same for rpciod. Otherwise rpciod could end up in a
> > > > deadlock where it can not continue without freeing memory that will only
> > > > become available when rpciod does its work:

Typically processes don't get an exception as such. Filesystems and
callers involved in IO are meant to use GFP_NOFS and GFP_NOIO as
appropriate to stop reclaim causing them to block on themselves.

> > > > ------------------------------------------------------------
> > > > 479 /*
> > > > 480  * Attempt to release the private state associated with a page
> > > > 481  * - Called if either PG_private or PG_fscache is set on the page
> > > > 482  * - Caller holds page lock
> > > > 483  * - Return true (may release page) or false (may not)
> > > > 484  */
> > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> > > > 486 {   
> > > > 487         struct address_space *mapping = page->mapping;
> > > > 488     
> > > > 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> > > > 490     
> > > > 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > > > 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > > > 493                 int how = FLUSH_SYNC;
> > > > 494             
> > > > 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > > > 496                 if (current_is_kswapd())
> > > > 497                         how = 0;
> > > > 498                 nfs_commit_inode(mapping->host, how);
> > > > 499         }
> > > > 500         /* If PagePrivate() is set, then the page is not freeable */
> > > > 501         if (PagePrivate(page))
> > > > 502                 return 0;
> > > > 503         return nfs_fscache_release_page(page, gfp);
> > > > 504 }
> > > > ------------------------------------------------------------
> > > > 
> > > > Another option is to change the priority of the memory allocation:
> > > > net/ipv4/af_inet.c
> > > > ------------------------------------------------------------
> > > >  265 static int inet_create(struct net *net, struct socket *sock, int
> > > > protocol,
> > > >  266                        int kern)
> > > >  267 {
> > > > ...
> > > >  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> > > > ------------------------------------------------------------
> > > > Considering this is generic net code, I assume the GFP_KERNEL will not
> > > > be replaced with GFP_ATOMIC.
> > > > 
> > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> > > > 'legacy' workqueue code.
> > > > 
> > > > cya,
> > > > #
> > > > 
> > > 
> > > I suspect this is also a problem in mainline, but maybe some of the
> > > recent writeback changes prevent it...
> > > 
> > > I think the right solution here is to make nfs_release_page treat rpciod
> > > similarly to kswapd. Easier said than done though -- you'll need to
> > > come up with a way to determine if you're running in rpciod context...
> > 
> > No. The _right_ solution is to ensure that rpciod doesn't do allocations
> > that result in a page reclaim... try_to_release_page() is just the tip
> > of the iceberg of crazy deadlocks that this socket allocation can get us
> > into.
> > 

Fully agreed. Callers that cannot dispatch IO or do filesystem based
operations should not specify __GFP_FS or __GFP_IO.

> > Unfortunately, selinux & co. prevent us from allocating the sockets in
> > user contexts, and anyway, having to wait for another thread to do the
> > same allocation isn't doing to help prevent the deadlock...
> > 
> 
> My thinking was that we could return without releasing the page, and
> eventually kmalloc would get to one that was able to be reclaimed via
> other means. That's probably too optimistic though...
> 

This is what is meant to happen if you use GFP flags appropriately. The
pages that cannot be reclaimed are skipped over until a clean page can
be discarded for example. In some cases it might still be possible to
swap out a page.

> > I know that Mel Gorman's NFS swap patches had some protections against
> > this sort of deadlock. Perhaps we can look at how he was doing this?
> > 
> 
> (cc'ing Mel in case he has thoughts)
> 

This is not what those patches are for. They are specific to the case
where the network traffic is related to swapping. It allows temporary
overriding of the watermarks to receive and send packets related to swap
and discards all others. It is not a general solution for the sort of
problem you are seeing.

> I looked over Mel's patches and they mainly seem to affect how memory
> for the socket is handled after it has been created. In this case, we're
> hanging on the allocation of a new socket altogether.
> 
> It does do this before calling xs_create_sock however:
> 
> +	if (xprt->swapper)
> +		current->flags |= PF_MEMALLOC;
> +
> 
> Perhaps we should just set PF_MEMALLOC unconditionally there?

No, that will cause the system to deadlock under very heavy memory
pressure. It's not always easy to trigger such a situation but it is always
the risk with PF_MEMALLOC is abused.

> We might
> still need Mel's other patches that add SOCK_MEMALLOC to ensure that it
> can actually be connected, but that's a bit of a different problem.
> 

Do not do this. The SOCK_MEMALLOC stuff is specific to swapping. It is
not a substitute to avoid using the correct flags. Abusing PF_MEMALLOC
leads to deadlock.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-28 13:40                   ` Myklebust, Trond
@ 2012-06-28 14:47                     ` Jeff Layton
  2012-06-28 14:51                     ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2012-06-28 14:47 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Harshula, linux-nfs, mgorman

On Thu, 28 Jun 2012 13:40:37 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Thu, 2012-06-28 at 09:19 -0400, Jeff Layton wrote:
> > On Wed, 27 Jun 2012 20:46:49 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Wed, 2012-06-27 at 16:19 -0400, Jeff Layton wrote:
> > > > On Wed, 27 Jun 2012 19:56:38 +0000
> > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > > 
> > > > > On Wed, 2012-06-27 at 15:28 -0400, Jeff Layton wrote:
> > > > > > On Wed, 27 Jun 2012 18:43:56 +0000
> > > > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > > > > If there really is no alternative to freeing the socket, then the only
> > > > > > real fix I can see is to set PF_MEMALLOC when we go to create it and
> > > > > > then reset it afterward. That's a pretty ugly fix though...
> > > > > 
> > > > > I can think of 2 possible alternatives:
> > > > > 
> > > > >      1. Use the PF_FSTRANS flag to tell nfs_release_page that this is a
> > > > >         direct reclaim from rpciod.
> > > > 
> > > > Hmm...that's a bit indiscriminate, isn't it? Looks like only XFS uses
> > > > that flag now...
> > > > 
> > > > Suppose we're in some XFS allocation and then want to commit a NFS
> > > > page to satisfy it. There's no real reason we couldn't but that would
> > > > prevent it...
> > > 
> > > So do all the GFP_NOFS allocations. How is this any different?
> > > 
> > 
> > Good point. Now that I've looked closer, I think this is probably the
> > best approach. Proposed patch below. It builds but is otherwise
> > untested so far...
> > 
> > > > >      2. Add a 'congested' flag to the rpc client that could tell reclaim
> > > > >         type operations to give up if a socket allocation is in
> > > > >         progress.
> > > > > 
> > > > 
> > > > That's not a bad idea. That has the benefit of only skipping reclaim
> > > > when it's directly associated with the socket being reconnected. Seems
> > > > like that would have to hang off of the rpc_xprt, but we could do
> > > > something along those lines. Presumably Mel's swap-over-nfs patches
> > > > could be converted to do something similar instead of using PF_MEMALLOC
> > > > there too.
> > > 
> > 
> > The problem with the above scheme is that it would be racy. It would
> > easily be possible for the congested flag to flip just after you
> > checked it but before issuing the commit.
> > 
> > Maybe we could push handling for that down into the commit call itself
> > -- allow it to give up if "congested" goes true while we're blocked and
> > waiting for the the reply, but that sounds complicated...
> > 
> > Anyway, here's a proposed patch for the PF_FSTRANS scheme. It builds
> > but I haven't otherwise tested it yet. For now, it just covers the
> > socket allocation and ->releasepage. Eventually I guess we'll probably
> > want to set and check for PF_FSTRANS in other places:
> 
> Is there any reason why we might not want all rpciod work to run under
> PF_FSTRANS? IOW: couldn't we just set/clear it unconditionally when
> entering/exiting rpc_async_schedule(), and xs_*_setup_socket()?
> 
> We shouldn't need to worry about the remaining rpciod work structures in
> rpc_timeout_upcall_queue and xprt_autoclose, since those don't ever need
> to do any allocations.
> 


I wasn't sure that that was really necessary. Just because we have
rpciod sleeping and waiting for memory to be reclaimed doesn't
necessarily mean that other RPCs can't go through.

IIUC, the CMWQ implementation will spawn a new execution context as
needed, even on a UP box, because we have WQ_MEM_RECLAIM set. The socket
reconnect is a bit of a special case since that *has* to be done before
anything can happen.

Still, I suppose there is a possible corner case here. If we're blocked
while in the middle of trying to commit, then the releasepage's commit
might end up stuck behind the commit_lock. I'll send a revised patch
separately...

Thanks for the help so far...
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket
  2012-06-28 13:40                   ` Myklebust, Trond
  2012-06-28 14:47                     ` Jeff Layton
@ 2012-06-28 14:51                     ` Jeff Layton
  2012-06-28 15:02                       ` Myklebust, Trond
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-28 14:51 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

The problem is pretty clear. rpciod is trying to allocate memory for a new
socket to talk to the server. The VM ends up calling ->releasepage to get
more memory, and it tries to do a blocking commit. That commit can't
succeed however without a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a blocking commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Also, cc'ing Mel since I borrowed his tsk_restore_flags helper here, and
this patch may cause merge conflicts with some of his Swap-over-NFS
patches.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/file.c                   |    8 ++++++--
 include/linux/sched.h           |    7 +++++++
 net/sunrpc/sched.c              |    4 ++++
 net/sunrpc/xprtrdma/transport.c |    4 +++-
 net/sunrpc/xprtsock.c           |   13 +++++++++++++
 5 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a6708e6b..80cc621 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -463,8 +463,12 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
 		int how = FLUSH_SYNC;
 
-		/* Don't let kswapd deadlock waiting for OOM RPC calls */
-		if (current_is_kswapd())
+		/*
+		 * Don't let kswapd deadlock waiting for OOM RPC calls, and
+		 * don't recurse back into the fs if we know that we're trying
+		 * to reclaim memory for an fs-related allocation.
+		 */
+		if (current_is_kswapd() || current->flags & PF_FSTRANS)
 			how = 0;
 		nfs_commit_inode(mapping->host, how);
 	}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4059c0f..d71b523 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1889,6 +1889,13 @@ static inline void rcu_switch_from(struct task_struct *prev)
 
 #endif
 
+static inline void tsk_restore_flags(struct task_struct *p,
+			unsigned long pflags, unsigned long mask)
+{
+	p->flags &= ~mask;
+	p->flags |= pflags & mask;
+}
+
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 994cfea..5c10ae6 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -790,7 +790,11 @@ void rpc_execute(struct rpc_task *task)
 
 static void rpc_async_schedule(struct work_struct *work)
 {
+	unsigned int pflags = current->flags;
+
+	current->flags |= PF_FSTRANS;
 	__rpc_execute(container_of(work, struct rpc_task, u.tk_work));
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /**
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index b446e10..7b0f74b 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -197,9 +197,11 @@ xprt_rdma_connect_worker(struct work_struct *work)
 	struct rpcrdma_xprt *r_xprt =
 		container_of(work, struct rpcrdma_xprt, rdma_connect.work);
 	struct rpc_xprt *xprt = &r_xprt->xprt;
+	unsigned long pflags = current->flags;
 	int rc = 0;
 
 	if (!xprt->shutdown) {
+		current->flags |= PF_FSTRANS;
 		xprt_clear_connected(xprt);
 
 		dprintk("RPC:       %s: %sconnect\n", __func__,
@@ -212,10 +214,10 @@ xprt_rdma_connect_worker(struct work_struct *work)
 
 out:
 	xprt_wake_pending_tasks(xprt, rc);
-
 out_clear:
 	dprintk("RPC:       %s: exit\n", __func__);
 	xprt_clear_connecting(xprt);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..6c7d8d5 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1890,11 +1890,14 @@ static void xs_local_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 	status = __sock_create(xprt->xprt_net, AF_LOCAL,
 					SOCK_STREAM, 0, &sock, 1);
@@ -1928,6 +1931,7 @@ static void xs_local_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
@@ -1965,11 +1969,14 @@ static void xs_udp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1988,6 +1995,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /*
@@ -2108,11 +2116,14 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2162,6 +2173,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	case -EINPROGRESS:
 	case -EALREADY:
 		xprt_clear_connecting(xprt);
+		tsk_restore_flags(current, pflags, PF_FSTRANS);
 		return;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
@@ -2174,6 +2186,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_FSTRANS);
 }
 
 /**
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete
  2012-06-28 14:31       ` Mel Gorman
@ 2012-06-28 14:55         ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2012-06-28 14:55 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Myklebust, Trond, Harshula, linux-nfs

On Thu, 28 Jun 2012 15:31:57 +0100
Mel Gorman <mgorman@suse.de> wrote:

> I am not on the linux-nfs list so I lack context about this issue so
> take all this with a grain of salt.
> 
> On Wed, Jun 20, 2012 at 07:14:22AM -0700, Jeff Layton wrote:
> > On Fri, 15 Jun 2012 21:25:10 +0000
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Fri, 2012-06-15 at 09:21 -0400, Jeff Layton wrote:
> > > > On Fri, 15 Jun 2012 22:54:10 +1000
> > > > Harshula <harshula@redhat.com> wrote:
> > > > 
> > > > > Hi All,
> > > > > 
> > > > > Can I please get your feedback on the following?
> > > > > 
> > > > > rpciod is blocked:
> > > > > ------------------------------------------------------------
> > > > > crash> bt  2507 
> > > > > PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
> > > > >  #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
> > > > >  #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
> > > > >  #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
> > > > >  #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
> > > > >  #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
> > > > >  #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
> > > > >  #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
> > > > >  #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
> > > > >  #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
> > > > >  #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
> > > > > #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
> > > > > #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
> > > > > #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
> > > > > #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
> > > > > #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
> > > > > #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
> > > > > #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
> > > > > #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
> > > > > #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
> > > > > #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
> > > > > #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
> > > > > #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
> > > > > #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
> > > > > #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
> > > > > #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
> > > > > #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> > > > > ------------------------------------------------------------
> > > > > 
> > > > > nfs_release_page() gives kswapd process an exemption from being blocked.
> > > > > Should we do the same for rpciod. Otherwise rpciod could end up in a
> > > > > deadlock where it can not continue without freeing memory that will only
> > > > > become available when rpciod does its work:
> 
> Typically processes don't get an exception as such. Filesystems and
> callers involved in IO are meant to use GFP_NOFS and GFP_NOIO as
> appropriate to stop reclaim causing them to block on themselves.
> 
> > > > > ------------------------------------------------------------
> > > > > 479 /*
> > > > > 480  * Attempt to release the private state associated with a page
> > > > > 481  * - Called if either PG_private or PG_fscache is set on the page
> > > > > 482  * - Caller holds page lock
> > > > > 483  * - Return true (may release page) or false (may not)
> > > > > 484  */
> > > > > 485 static int nfs_release_page(struct page *page, gfp_t gfp)
> > > > > 486 {   
> > > > > 487         struct address_space *mapping = page->mapping;
> > > > > 488     
> > > > > 489         dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
> > > > > 490     
> > > > > 491         /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > > > > 492         if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > > > > 493                 int how = FLUSH_SYNC;
> > > > > 494             
> > > > > 495                 /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > > > > 496                 if (current_is_kswapd())
> > > > > 497                         how = 0;
> > > > > 498                 nfs_commit_inode(mapping->host, how);
> > > > > 499         }
> > > > > 500         /* If PagePrivate() is set, then the page is not freeable */
> > > > > 501         if (PagePrivate(page))
> > > > > 502                 return 0;
> > > > > 503         return nfs_fscache_release_page(page, gfp);
> > > > > 504 }
> > > > > ------------------------------------------------------------
> > > > > 
> > > > > Another option is to change the priority of the memory allocation:
> > > > > net/ipv4/af_inet.c
> > > > > ------------------------------------------------------------
> > > > >  265 static int inet_create(struct net *net, struct socket *sock, int
> > > > > protocol,
> > > > >  266                        int kern)
> > > > >  267 {
> > > > > ...
> > > > >  345         sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);
> > > > > ------------------------------------------------------------
> > > > > Considering this is generic net code, I assume the GFP_KERNEL will not
> > > > > be replaced with GFP_ATOMIC.
> > > > > 
> > > > > NOTE, this is on RHEL 6.1 kernel 2.6.32-131.6.1 and apparently uses
> > > > > 'legacy' workqueue code.
> > > > > 
> > > > > cya,
> > > > > #
> > > > > 
> > > > 
> > > > I suspect this is also a problem in mainline, but maybe some of the
> > > > recent writeback changes prevent it...
> > > > 
> > > > I think the right solution here is to make nfs_release_page treat rpciod
> > > > similarly to kswapd. Easier said than done though -- you'll need to
> > > > come up with a way to determine if you're running in rpciod context...
> > > 
> > > No. The _right_ solution is to ensure that rpciod doesn't do allocations
> > > that result in a page reclaim... try_to_release_page() is just the tip
> > > of the iceberg of crazy deadlocks that this socket allocation can get us
> > > into.
> > > 
> 
> Fully agreed. Callers that cannot dispatch IO or do filesystem based
> operations should not specify __GFP_FS or __GFP_IO.
> 
> > > Unfortunately, selinux & co. prevent us from allocating the sockets in
> > > user contexts, and anyway, having to wait for another thread to do the
> > > same allocation isn't doing to help prevent the deadlock...
> > > 
> > 
> > My thinking was that we could return without releasing the page, and
> > eventually kmalloc would get to one that was able to be reclaimed via
> > other means. That's probably too optimistic though...
> > 
> 
> This is what is meant to happen if you use GFP flags appropriately. The
> pages that cannot be reclaimed are skipped over until a clean page can
> be discarded for example. In some cases it might still be possible to
> swap out a page.
> 
> > > I know that Mel Gorman's NFS swap patches had some protections against
> > > this sort of deadlock. Perhaps we can look at how he was doing this?
> > > 
> > 
> > (cc'ing Mel in case he has thoughts)
> > 
> 
> This is not what those patches are for. They are specific to the case
> where the network traffic is related to swapping. It allows temporary
> overriding of the watermarks to receive and send packets related to swap
> and discards all others. It is not a general solution for the sort of
> problem you are seeing.
> 
> > I looked over Mel's patches and they mainly seem to affect how memory
> > for the socket is handled after it has been created. In this case, we're
> > hanging on the allocation of a new socket altogether.
> > 
> > It does do this before calling xs_create_sock however:
> > 
> > +	if (xprt->swapper)
> > +		current->flags |= PF_MEMALLOC;
> > +
> > 
> > Perhaps we should just set PF_MEMALLOC unconditionally there?
> 
> No, that will cause the system to deadlock under very heavy memory
> pressure. It's not always easy to trigger such a situation but it is always
> the risk with PF_MEMALLOC is abused.
> 
> > We might
> > still need Mel's other patches that add SOCK_MEMALLOC to ensure that it
> > can actually be connected, but that's a bit of a different problem.
> > 
> 
> Do not do this. The SOCK_MEMALLOC stuff is specific to swapping. It is
> not a substitute to avoid using the correct flags. Abusing PF_MEMALLOC
> leads to deadlock.
> 

Ok, thanks for having a look. Sorry to add you in the middle of the
thread, that does make it a bit difficult to understand what's
happening.

It would be nice if we could pass GFP flags appropriately here, but the
problem is that the GFP flags for the socket allocation are buried way
down deep inside of inet_create(). See the sk_alloc() call in there.

Now, we could fix it up so that __sock_create passes a set of GFP flags
all the way down to that point, but even that won't really save us here
in all cases. There's a sock_alloc() call in there which will need an
inode. __sock_create also does a request_module() and that can
certainly require memory too. Good luck getting that to take GFP flags.

The right approach I think is to do what Trond suggested an an earlier
mail and leverage the PF_FSTRANS flag...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket
  2012-06-28 14:51                     ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton
@ 2012-06-28 15:02                       ` Myklebust, Trond
  2012-06-28 15:25                         ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-06-28 15:02 UTC (permalink / raw)
  To: Jeff Layton; +Cc: harshula, linux-nfs, mgorman

T24gVGh1LCAyMDEyLTA2LTI4IGF0IDEwOjUxIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
V2UndmUgaGFkIHNvbWUgcmVwb3J0cyBvZiBhIGRlYWRsb2NrIHdoZXJlIHJwY2lvZCBlbmRzIHVw
IHdpdGggYSBzdGFjaw0KPiB0cmFjZSBsaWtlIHRoaXM6DQo+IA0KPiAgICAgUElEOiAyNTA3ICAg
VEFTSzogZmZmZjg4MTAzNjkxYWI0MCAgQ1BVOiAxNCAgQ09NTUFORDogInJwY2lvZC8xNCINCj4g
ICAgICAjMCBbZmZmZjg4MTAzNDNiZjJmMF0gc2NoZWR1bGUgYXQgZmZmZmZmZmY4MTRkYWJkOQ0K
PiAgICAgICMxIFtmZmZmODgxMDM0M2JmM2I4XSBuZnNfd2FpdF9iaXRfa2lsbGFibGUgYXQgZmZm
ZmZmZmZhMDM4ZmMwNCBbbmZzXQ0KPiAgICAgICMyIFtmZmZmODgxMDM0M2JmM2M4XSBfX3dhaXRf
b25fYml0IGF0IGZmZmZmZmZmODE0ZGJjMmYNCj4gICAgICAjMyBbZmZmZjg4MTAzNDNiZjQxOF0g
b3V0X29mX2xpbmVfd2FpdF9vbl9iaXQgYXQgZmZmZmZmZmY4MTRkYmNkOA0KPiAgICAgICM0IFtm
ZmZmODgxMDM0M2JmNDg4XSBuZnNfY29tbWl0X2lub2RlIGF0IGZmZmZmZmZmYTAzOWUwYzEgW25m
c10NCj4gICAgICAjNSBbZmZmZjg4MTAzNDNiZjRmOF0gbmZzX3JlbGVhc2VfcGFnZSBhdCBmZmZm
ZmZmZmEwMzhiZWY2IFtuZnNdDQo+ICAgICAgIzYgW2ZmZmY4ODEwMzQzYmY1MjhdIHRyeV90b19y
ZWxlYXNlX3BhZ2UgYXQgZmZmZmZmZmY4MTEwYzY3MA0KPiAgICAgICM3IFtmZmZmODgxMDM0M2Jm
NTM4XSBzaHJpbmtfcGFnZV9saXN0LmNsb25lLjAgYXQgZmZmZmZmZmY4MTEyNjI3MQ0KPiAgICAg
ICM4IFtmZmZmODgxMDM0M2JmNjY4XSBzaHJpbmtfaW5hY3RpdmVfbGlzdCBhdCBmZmZmZmZmZjgx
MTI2NjM4DQo+ICAgICAgIzkgW2ZmZmY4ODEwMzQzYmY4MThdIHNocmlua196b25lIGF0IGZmZmZm
ZmZmODExMjc4OGYNCj4gICAgICMxMCBbZmZmZjg4MTAzNDNiZjhjOF0gZG9fdHJ5X3RvX2ZyZWVf
cGFnZXMgYXQgZmZmZmZmZmY4MTEyN2IxZQ0KPiAgICAgIzExIFtmZmZmODgxMDM0M2JmOTU4XSB0
cnlfdG9fZnJlZV9wYWdlcyBhdCBmZmZmZmZmZjgxMTI4MTJmDQo+ICAgICAjMTIgW2ZmZmY4ODEw
MzQzYmZhMDhdIF9fYWxsb2NfcGFnZXNfbm9kZW1hc2sgYXQgZmZmZmZmZmY4MTExZmRhZA0KPiAg
ICAgIzEzIFtmZmZmODgxMDM0M2JmYjI4XSBrbWVtX2dldHBhZ2VzIGF0IGZmZmZmZmZmODExNTk5
NDINCj4gICAgICMxNCBbZmZmZjg4MTAzNDNiZmI1OF0gZmFsbGJhY2tfYWxsb2MgYXQgZmZmZmZm
ZmY4MTE1YTU1YQ0KPiAgICAgIzE1IFtmZmZmODgxMDM0M2JmYmQ4XSBfX19fY2FjaGVfYWxsb2Nf
bm9kZSBhdCBmZmZmZmZmZjgxMTVhMmQ5DQo+ICAgICAjMTYgW2ZmZmY4ODEwMzQzYmZjMzhdIGtt
ZW1fY2FjaGVfYWxsb2MgYXQgZmZmZmZmZmY4MTE1YjA5Yg0KPiAgICAgIzE3IFtmZmZmODgxMDM0
M2JmYzc4XSBza19wcm90X2FsbG9jIGF0IGZmZmZmZmZmODE0MTE4MDgNCj4gICAgICMxOCBbZmZm
Zjg4MTAzNDNiZmNiOF0gc2tfYWxsb2MgYXQgZmZmZmZmZmY4MTQxMTk3Yw0KPiAgICAgIzE5IFtm
ZmZmODgxMDM0M2JmY2U4XSBpbmV0X2NyZWF0ZSBhdCBmZmZmZmZmZjgxNDgzYmE2DQo+ICAgICAj
MjAgW2ZmZmY4ODEwMzQzYmZkMzhdIF9fc29ja19jcmVhdGUgYXQgZmZmZmZmZmY4MTQwYjRhNw0K
PiAgICAgIzIxIFtmZmZmODgxMDM0M2JmZDk4XSB4c19jcmVhdGVfc29jayBhdCBmZmZmZmZmZmEw
MWY2NDliIFtzdW5ycGNdDQo+ICAgICAjMjIgW2ZmZmY4ODEwMzQzYmZkZDhdIHhzX3RjcF9zZXR1
cF9zb2NrZXQgYXQgZmZmZmZmZmZhMDFmNjk2NSBbc3VucnBjXQ0KPiAgICAgIzIzIFtmZmZmODgx
MDM0M2JmZTM4XSB3b3JrZXJfdGhyZWFkIGF0IGZmZmZmZmZmODEwODg3ZDANCj4gICAgICMyNCBb
ZmZmZjg4MTAzNDNiZmVlOF0ga3RocmVhZCBhdCBmZmZmZmZmZjgxMDhkZDk2DQo+ICAgICAjMjUg
W2ZmZmY4ODEwMzQzYmZmNDhdIGtlcm5lbF90aHJlYWQgYXQgZmZmZmZmZmY4MTAwYzFjYQ0KPiAN
Cj4gVGhlIHByb2JsZW0gaXMgcHJldHR5IGNsZWFyLiBycGNpb2QgaXMgdHJ5aW5nIHRvIGFsbG9j
YXRlIG1lbW9yeSBmb3IgYSBuZXcNCj4gc29ja2V0IHRvIHRhbGsgdG8gdGhlIHNlcnZlci4gVGhl
IFZNIGVuZHMgdXAgY2FsbGluZyAtPnJlbGVhc2VwYWdlIHRvIGdldA0KPiBtb3JlIG1lbW9yeSwg
YW5kIGl0IHRyaWVzIHRvIGRvIGEgYmxvY2tpbmcgY29tbWl0LiBUaGF0IGNvbW1pdCBjYW4ndA0K
PiBzdWNjZWVkIGhvd2V2ZXIgd2l0aG91dCBhIGNvbm5lY3RlZCBzb2NrZXQsIHNvIHdlIGRlYWRs
b2NrLg0KPiANCj4gRml4IHRoaXMgYnkgc2V0dGluZyBQRl9GU1RSQU5TIG9uIHRoZSB3b3JrcXVl
dWUgdGFzayBwcmlvciB0byBkb2luZyB0aGUNCj4gc29ja2V0IGFsbG9jYXRpb24sIGFuZCBoYXZp
bmcgbmZzX3JlbGVhc2VfcGFnZSBjaGVjayBmb3IgdGhhdCBmbGFnIHdoZW4NCj4gZGVjaWRpbmcg
d2hldGhlciB0byBkbyBhIGJsb2NraW5nIGNvbW1pdCBjYWxsLiBBbHNvLCBzZXQgUEZfRlNUUkFO
Uw0KPiB1bmNvbmRpdGlvbmFsbHkgaW4gcnBjX2FzeW5jX3NjaGVkdWxlIHNpbmNlIHRoYXQgZnVu
Y3Rpb24gY2FuIGFsc28gZG8NCj4gYWxsb2NhdGlvbnMgc29tZXRpbWVzLg0KPiANCj4gQWxzbywg
Y2MnaW5nIE1lbCBzaW5jZSBJIGJvcnJvd2VkIGhpcyB0c2tfcmVzdG9yZV9mbGFncyBoZWxwZXIg
aGVyZSwgYW5kDQo+IHRoaXMgcGF0Y2ggbWF5IGNhdXNlIG1lcmdlIGNvbmZsaWN0cyB3aXRoIHNv
bWUgb2YgaGlzIFN3YXAtb3Zlci1ORlMNCj4gcGF0Y2hlcy4NCj4gDQo+IENjOiBNZWwgR29ybWFu
IDxtZ29ybWFuQHN1c2UuZGU+DQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9u
QHJlZGhhdC5jb20+DQo+IC0tLQ0KPiAgZnMvbmZzL2ZpbGUuYyAgICAgICAgICAgICAgICAgICB8
ICAgIDggKysrKysrLS0NCj4gIGluY2x1ZGUvbGludXgvc2NoZWQuaCAgICAgICAgICAgfCAgICA3
ICsrKysrKysNCj4gIG5ldC9zdW5ycGMvc2NoZWQuYyAgICAgICAgICAgICAgfCAgICA0ICsrKysN
Cj4gIG5ldC9zdW5ycGMveHBydHJkbWEvdHJhbnNwb3J0LmMgfCAgICA0ICsrKy0NCj4gIG5ldC9z
dW5ycGMveHBydHNvY2suYyAgICAgICAgICAgfCAgIDEzICsrKysrKysrKysrKysNCj4gIDUgZmls
ZXMgY2hhbmdlZCwgMzMgaW5zZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYg
LS1naXQgYS9mcy9uZnMvZmlsZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiBpbmRleCBhNjcwOGU2Yi4u
ODBjYzYyMSAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmls
ZS5jDQo+IEBAIC00NjMsOCArNDYzLDEyIEBAIHN0YXRpYyBpbnQgbmZzX3JlbGVhc2VfcGFnZShz
dHJ1Y3QgcGFnZSAqcGFnZSwgZ2ZwX3QgZ2ZwKQ0KPiAgCWlmIChtYXBwaW5nICYmIChnZnAgJiBH
RlBfS0VSTkVMKSA9PSBHRlBfS0VSTkVMKSB7DQo+ICAJCWludCBob3cgPSBGTFVTSF9TWU5DOw0K
PiAgDQo+IC0JCS8qIERvbid0IGxldCBrc3dhcGQgZGVhZGxvY2sgd2FpdGluZyBmb3IgT09NIFJQ
QyBjYWxscyAqLw0KPiAtCQlpZiAoY3VycmVudF9pc19rc3dhcGQoKSkNCj4gKwkJLyoNCj4gKwkJ
ICogRG9uJ3QgbGV0IGtzd2FwZCBkZWFkbG9jayB3YWl0aW5nIGZvciBPT00gUlBDIGNhbGxzLCBh
bmQNCj4gKwkJICogZG9uJ3QgcmVjdXJzZSBiYWNrIGludG8gdGhlIGZzIGlmIHdlIGtub3cgdGhh
dCB3ZSdyZSB0cnlpbmcNCj4gKwkJICogdG8gcmVjbGFpbSBtZW1vcnkgZm9yIGFuIGZzLXJlbGF0
ZWQgYWxsb2NhdGlvbi4NCj4gKwkJICovDQo+ICsJCWlmIChjdXJyZW50X2lzX2tzd2FwZCgpIHx8
IGN1cnJlbnQtPmZsYWdzICYgUEZfRlNUUkFOUykNCj4gIAkJCWhvdyA9IDA7DQo+ICAJCW5mc19j
b21taXRfaW5vZGUobWFwcGluZy0+aG9zdCwgaG93KTsNCg0KVGhpcyB3aWxsIHN0aWxsIHRyeSB0
byBkbyBhIGRpcmVjdCByZWNsYWltLiBFdmVuIGlmIHdlIG1ha2UgdGhhdCByZWNsYWltDQpub24t
YmxvY2tpbmcsIEknbSB0aGlua2luZyB0aGF0IHdlIHNob3VsZCBwZXJoYXBzIGp1c3Qgc2tpcCB0
aGUNCm5mc19jb21taXRfaW5vZGUoKSBpbiB0aGUgY2FzZSB3aGVyZSBQRl9GU1RSQU5TIGlzIHNl
dDogaWYgd2UncmUgbG93IG9uDQptZW1vcnksIHRoZW4gaXQgaXMgYmV0dGVyIHRvIHNraXAgdGhl
IG5mc19jb21taXRfaW5vZGUgcmF0aGVyIHRoYW4gdGllDQp1cCBtb3JlIHJlc291cmNlcyBpbiBh
biBhdHRlbXB0IHRvIGZyZWUgdXAgbWVtb3J5IHRoYXQgd2UgX2tub3dfIGlzDQpnb2luZyB0byBi
ZSBpbiB2YWluLg0KDQo+ICAJfQ0KPiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9zY2hlZC5o
IGIvaW5jbHVkZS9saW51eC9zY2hlZC5oDQo+IGluZGV4IDQwNTljMGYuLmQ3MWI1MjMgMTAwNjQ0
DQo+IC0tLSBhL2luY2x1ZGUvbGludXgvc2NoZWQuaA0KPiArKysgYi9pbmNsdWRlL2xpbnV4L3Nj
aGVkLmgNCj4gQEAgLTE4ODksNiArMTg4OSwxMyBAQCBzdGF0aWMgaW5saW5lIHZvaWQgcmN1X3N3
aXRjaF9mcm9tKHN0cnVjdCB0YXNrX3N0cnVjdCAqcHJldikNCj4gIA0KPiAgI2VuZGlmDQo+ICAN
Cj4gK3N0YXRpYyBpbmxpbmUgdm9pZCB0c2tfcmVzdG9yZV9mbGFncyhzdHJ1Y3QgdGFza19zdHJ1
Y3QgKnAsDQo+ICsJCQl1bnNpZ25lZCBsb25nIHBmbGFncywgdW5zaWduZWQgbG9uZyBtYXNrKQ0K
PiArew0KPiArCXAtPmZsYWdzICY9IH5tYXNrOw0KPiArCXAtPmZsYWdzIHw9IHBmbGFncyAmIG1h
c2s7DQo+ICt9DQoNCkRvIHdlIG5lZWQgdGhpcz8gcnBjaW9kIGlzIGEgd29ya3F1ZXVlIHRocmVh
ZCwgc28gaW4gcHJpbmNpcGxlIHdlDQphbHJlYWR5IF9rbm93XyB0aGF0IFBGX0ZTVFJBTlMgaXMg
bm90IGdvaW5nIHRvIGJlIHNldC4NCg0KDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBO
RlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNv
bQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  2012-06-28 15:02                       ` Myklebust, Trond
@ 2012-06-28 15:25                         ` Jeff Layton
  2012-07-02 15:17                           ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-06-28 15:25 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/file.c                   |    7 +++++--
 net/sunrpc/sched.c              |    2 ++
 net/sunrpc/xprtrdma/transport.c |    3 ++-
 net/sunrpc/xprtsock.c           |   10 ++++++++++
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a6708e6b..9075769 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -459,8 +459,11 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 
 	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
 
-	/* Only do I/O if gfp is a superset of GFP_KERNEL */
-	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
+	/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
+	 * doing this memory reclaim for a fs-related allocation.
+	 */
+	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
+	    !(current->flags & PF_FSTRANS)) {
 		int how = FLUSH_SYNC;
 
 		/* Don't let kswapd deadlock waiting for OOM RPC calls */
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 994cfea..eda32ae 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -790,7 +790,9 @@ void rpc_execute(struct rpc_task *task)
 
 static void rpc_async_schedule(struct work_struct *work)
 {
+	current->flags |= PF_FSTRANS;
 	__rpc_execute(container_of(work, struct rpc_task, u.tk_work));
+	current->flags &= ~PF_FSTRANS;
 }
 
 /**
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index b446e10..06cdbff 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -200,6 +200,7 @@ xprt_rdma_connect_worker(struct work_struct *work)
 	int rc = 0;
 
 	if (!xprt->shutdown) {
+		current->flags |= PF_FSTRANS;
 		xprt_clear_connected(xprt);
 
 		dprintk("RPC:       %s: %sconnect\n", __func__,
@@ -212,10 +213,10 @@ xprt_rdma_connect_worker(struct work_struct *work)
 
 out:
 	xprt_wake_pending_tasks(xprt, rc);
-
 out_clear:
 	dprintk("RPC:       %s: exit\n", __func__);
 	xprt_clear_connecting(xprt);
+	current->flags &= ~PF_FSTRANS;
 }
 
 /*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..b88c6bf 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1895,6 +1895,8 @@ static void xs_local_setup_socket(struct work_struct *work)
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 	status = __sock_create(xprt->xprt_net, AF_LOCAL,
 					SOCK_STREAM, 0, &sock, 1);
@@ -1928,6 +1930,7 @@ static void xs_local_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	current->flags &= ~PF_FSTRANS;
 }
 
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
@@ -1970,6 +1973,8 @@ static void xs_udp_setup_socket(struct work_struct *work)
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 	sock = xs_create_sock(xprt, transport,
@@ -1988,6 +1993,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	current->flags &= ~PF_FSTRANS;
 }
 
 /*
@@ -2113,6 +2119,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	if (xprt->shutdown)
 		goto out;
 
+	current->flags |= PF_FSTRANS;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = xs_create_sock(xprt, transport,
@@ -2162,6 +2170,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	case -EINPROGRESS:
 	case -EALREADY:
 		xprt_clear_connecting(xprt);
+		current->flags &= ~PF_FSTRANS;
 		return;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
@@ -2174,6 +2183,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	current->flags &= ~PF_FSTRANS;
 }
 
 /**
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  2012-06-28 15:25                         ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton
@ 2012-07-02 15:17                           ` Jeff Layton
  2012-07-02 19:40                             ` Myklebust, Trond
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2012-07-02 15:17 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil

On Thu, 28 Jun 2012 11:25:13 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> We've had some reports of a deadlock where rpciod ends up with a stack
> trace like this:
> 
>     PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
>      #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
>      #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
>      #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
>      #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
>      #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
>      #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
>      #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
>      #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
>      #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
>      #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
>     #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
>     #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
>     #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
>     #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
>     #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
>     #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
>     #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
>     #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
>     #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
>     #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
>     #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
>     #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
>     #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
>     #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
>     #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
>     #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
> 
> rpciod is trying to allocate memory for a new socket to talk to the
> server. The VM ends up calling ->releasepage to get more memory, and it
> tries to do a blocking commit. That commit can't succeed however without
> a connected socket, so we deadlock.
> 
> Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
> socket allocation, and having nfs_release_page check for that flag when
> deciding whether to do a commit call. Also, set PF_FSTRANS
> unconditionally in rpc_async_schedule since that function can also do
> allocations sometimes.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/nfs/file.c                   |    7 +++++--
>  net/sunrpc/sched.c              |    2 ++
>  net/sunrpc/xprtrdma/transport.c |    3 ++-
>  net/sunrpc/xprtsock.c           |   10 ++++++++++
>  4 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index a6708e6b..9075769 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -459,8 +459,11 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
>  
>  	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
>  
> -	/* Only do I/O if gfp is a superset of GFP_KERNEL */
> -	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> +	/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
> +	 * doing this memory reclaim for a fs-related allocation.
> +	 */
> +	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
> +	    !(current->flags & PF_FSTRANS)) {
>  		int how = FLUSH_SYNC;
>  
>  		/* Don't let kswapd deadlock waiting for OOM RPC calls */
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 994cfea..eda32ae 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -790,7 +790,9 @@ void rpc_execute(struct rpc_task *task)
>  
>  static void rpc_async_schedule(struct work_struct *work)
>  {
> +	current->flags |= PF_FSTRANS;
>  	__rpc_execute(container_of(work, struct rpc_task, u.tk_work));
> +	current->flags &= ~PF_FSTRANS;
>  }
>  
>  /**
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index b446e10..06cdbff 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -200,6 +200,7 @@ xprt_rdma_connect_worker(struct work_struct *work)
>  	int rc = 0;
>  
>  	if (!xprt->shutdown) {
> +		current->flags |= PF_FSTRANS;
>  		xprt_clear_connected(xprt);
>  
>  		dprintk("RPC:       %s: %sconnect\n", __func__,
> @@ -212,10 +213,10 @@ xprt_rdma_connect_worker(struct work_struct *work)
>  
>  out:
>  	xprt_wake_pending_tasks(xprt, rc);
> -
>  out_clear:
>  	dprintk("RPC:       %s: exit\n", __func__);
>  	xprt_clear_connecting(xprt);
> +	current->flags &= ~PF_FSTRANS;
>  }
>  
>  /*
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 890b03f..b88c6bf 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1895,6 +1895,8 @@ static void xs_local_setup_socket(struct work_struct *work)
>  	if (xprt->shutdown)
>  		goto out;
>  
> +	current->flags |= PF_FSTRANS;
> +
>  	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
>  	status = __sock_create(xprt->xprt_net, AF_LOCAL,
>  					SOCK_STREAM, 0, &sock, 1);
> @@ -1928,6 +1930,7 @@ static void xs_local_setup_socket(struct work_struct *work)
>  out:
>  	xprt_clear_connecting(xprt);
>  	xprt_wake_pending_tasks(xprt, status);
> +	current->flags &= ~PF_FSTRANS;
>  }
>  
>  static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> @@ -1970,6 +1973,8 @@ static void xs_udp_setup_socket(struct work_struct *work)
>  	if (xprt->shutdown)
>  		goto out;
>  
> +	current->flags |= PF_FSTRANS;
> +
>  	/* Start by resetting any existing state */
>  	xs_reset_transport(transport);
>  	sock = xs_create_sock(xprt, transport,
> @@ -1988,6 +1993,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
>  out:
>  	xprt_clear_connecting(xprt);
>  	xprt_wake_pending_tasks(xprt, status);
> +	current->flags &= ~PF_FSTRANS;
>  }
>  
>  /*
> @@ -2113,6 +2119,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
>  	if (xprt->shutdown)
>  		goto out;
>  
> +	current->flags |= PF_FSTRANS;
> +
>  	if (!sock) {
>  		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
>  		sock = xs_create_sock(xprt, transport,
> @@ -2162,6 +2170,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
>  	case -EINPROGRESS:
>  	case -EALREADY:
>  		xprt_clear_connecting(xprt);
> +		current->flags &= ~PF_FSTRANS;
>  		return;
>  	case -EINVAL:
>  		/* Happens, for instance, if the user specified a link
> @@ -2174,6 +2183,7 @@ out_eagain:
>  out:
>  	xprt_clear_connecting(xprt);
>  	xprt_wake_pending_tasks(xprt, status);
> +	current->flags &= ~PF_FSTRANS;
>  }
>  
>  /**

I've given a backported version of the above patch to a customer and
they're testing it now. It may be some time before we have any results
however. In the meantime, someone reported this bug against Fedora 16
which is a similar situation:

    https://bugzilla.redhat.com/show_bug.cgi?id=834808

Here's the stack trace of interest:

# cat /proc/pid-of-openvpn/stack 
[<ffffffffa08484a9>] nfs_wait_bit_killable+0x39/0x90 [nfs]
[<ffffffffa085737a>] nfs_commit_inode+0xaa/0x250 [nfs]
[<ffffffffa0845f14>] nfs_release_page+0x84/0xa0 [nfs]
[<ffffffff811205f2>] try_to_release_page+0x32/0x50
[<ffffffff81133e62>] shrink_page_list+0x792/0x9a0
[<ffffffff81134484>] shrink_inactive_list+0x184/0x4f0
[<ffffffff81134fc8>] shrink_mem_cgroup_zone+0x448/0x5d0
[<ffffffff811351c6>] shrink_zone+0x76/0xa0
[<ffffffff81135564>] do_try_to_free_pages+0x104/0x5c0
[<ffffffff81135cfb>] try_to_free_pages+0xab/0x170
[<ffffffff811299dc>] __alloc_pages_nodemask+0x53c/0x8f0
[<ffffffff81161273>] alloc_pages_current+0xa3/0x110
[<ffffffff8116c2e5>] new_slab+0x245/0x2f0
[<ffffffff815ed28d>] __slab_alloc+0x32b/0x440
[<ffffffff8116f2b7>] __kmalloc_node_track_caller+0x97/0x1f0
[<ffffffff814d9678>] __alloc_skb+0x78/0x230
[<ffffffff814d3fbc>] sock_alloc_send_pskb+0x1ac/0x2f0
[<ffffffff814d4115>] sock_alloc_send_skb+0x15/0x20
[<ffffffff81520861>] __ip_append_data+0x711/0xa10
[<ffffffff81522bd0>] ip_make_skb+0xd0/0x110
[<ffffffff815464ef>] udp_sendmsg+0x2df/0x990
[<ffffffff81550f24>] inet_sendmsg+0x64/0xb0
[<ffffffff814cedc7>] sock_sendmsg+0x117/0x130
[<ffffffff814d22bd>] sys_sendto+0x13d/0x190
[<ffffffff815fc5e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Here, they're mounting over openvpn and it deadlocks in a similar
fashion. The openvpn process is to get memory to send a frame on the
"physical" connection, but that won't happen until the commit comes
back.

Perhaps we should also take something like the (untested) patch below
on top of the patch above? Note that kswapd always has PF_MEMALLOC set:

-----------------[snip]--------------------

nfs: broaden the cases where we use a non-blocking commit in releasepage

Currently, we just do a non-blocking commit when called from kswapd, but
that still gives us some cases where we end up blocking after recursing
back into the filesystem. Instead, turn that check into a check for
PF_MEMALLOC so that we never block when trying to free memory in order to
satisfy an allocation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/file.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 9075769..61d3670 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -466,8 +466,8 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
 	    !(current->flags & PF_FSTRANS)) {
 		int how = FLUSH_SYNC;
 
-		/* Don't let kswapd deadlock waiting for OOM RPC calls */
-		if (current_is_kswapd())
+		/* Don't block if we're freeing for a memory allocation */
+		if (current->flags & PF_MEMALLOC)
 			how = 0;
 		nfs_commit_inode(mapping->host, how);
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  2012-07-02 15:17                           ` Jeff Layton
@ 2012-07-02 19:40                             ` Myklebust, Trond
  2012-07-03 11:20                               ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Myklebust, Trond @ 2012-07-02 19:40 UTC (permalink / raw)
  To: Jeff Layton; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil

T24gTW9uLCAyMDEyLTA3LTAyIGF0IDExOjE3IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
bmZzOiBicm9hZGVuIHRoZSBjYXNlcyB3aGVyZSB3ZSB1c2UgYSBub24tYmxvY2tpbmcgY29tbWl0
IGluIHJlbGVhc2VwYWdlDQo+IA0KPiBDdXJyZW50bHksIHdlIGp1c3QgZG8gYSBub24tYmxvY2tp
bmcgY29tbWl0IHdoZW4gY2FsbGVkIGZyb20ga3N3YXBkLCBidXQNCj4gdGhhdCBzdGlsbCBnaXZl
cyB1cyBzb21lIGNhc2VzIHdoZXJlIHdlIGVuZCB1cCBibG9ja2luZyBhZnRlciByZWN1cnNpbmcN
Cj4gYmFjayBpbnRvIHRoZSBmaWxlc3lzdGVtLiBJbnN0ZWFkLCB0dXJuIHRoYXQgY2hlY2sgaW50
byBhIGNoZWNrIGZvcg0KPiBQRl9NRU1BTExPQyBzbyB0aGF0IHdlIG5ldmVyIGJsb2NrIHdoZW4g
dHJ5aW5nIHRvIGZyZWUgbWVtb3J5IGluIG9yZGVyIHRvDQo+IHNhdGlzZnkgYW4gYWxsb2NhdGlv
bi4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhhdC5jb20+
DQo+IC0tLQ0KPiAgZnMvbmZzL2ZpbGUuYyB8ICAgIDQgKystLQ0KPiAgMSBmaWxlcyBjaGFuZ2Vk
LCAyIGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMv
bmZzL2ZpbGUuYyBiL2ZzL25mcy9maWxlLmMNCj4gaW5kZXggOTA3NTc2OS4uNjFkMzY3MCAxMDA2
NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmlsZS5jDQo+IEBAIC00
NjYsOCArNDY2LDggQEAgc3RhdGljIGludCBuZnNfcmVsZWFzZV9wYWdlKHN0cnVjdCBwYWdlICpw
YWdlLCBnZnBfdCBnZnApDQo+ICAJICAgICEoY3VycmVudC0+ZmxhZ3MgJiBQRl9GU1RSQU5TKSkg
ew0KPiAgCQlpbnQgaG93ID0gRkxVU0hfU1lOQzsNCj4gIA0KPiAtCQkvKiBEb24ndCBsZXQga3N3
YXBkIGRlYWRsb2NrIHdhaXRpbmcgZm9yIE9PTSBSUEMgY2FsbHMgKi8NCj4gLQkJaWYgKGN1cnJl
bnRfaXNfa3N3YXBkKCkpDQo+ICsJCS8qIERvbid0IGJsb2NrIGlmIHdlJ3JlIGZyZWVpbmcgZm9y
IGEgbWVtb3J5IGFsbG9jYXRpb24gKi8NCj4gKwkJaWYgKGN1cnJlbnQtPmZsYWdzICYgUEZfTUVN
QUxMT0MpDQo+ICAJCQlob3cgPSAwOw0KPiAgCQluZnNfY29tbWl0X2lub2RlKG1hcHBpbmctPmhv
c3QsIGhvdyk7DQo+ICAJfQ0KDQpVbW0uLi4gU28gd2hpY2ggcHJvY2VzcyB3aWxsIGFjdHVhbGx5
IGNhbGwgbmZzX3JlbGVhc2VfcGFnZSgpIHdpdGgNCkdGUF9LRVJORUwsIGJ1dCB3aXRob3V0IHRo
ZSBQRl9NRU1BTExPQyBmbGFnIGJlaW5nIHNldD8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxp
bnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRh
cHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  2012-07-02 19:40                             ` Myklebust, Trond
@ 2012-07-03 11:20                               ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2012-07-03 11:20 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: harshula, linux-nfs, mgorman, jan.kratochvil

On Mon, 2 Jul 2012 19:40:22 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Mon, 2012-07-02 at 11:17 -0400, Jeff Layton wrote:
> > nfs: broaden the cases where we use a non-blocking commit in releasepage
> > 
> > Currently, we just do a non-blocking commit when called from kswapd, but
> > that still gives us some cases where we end up blocking after recursing
> > back into the filesystem. Instead, turn that check into a check for
> > PF_MEMALLOC so that we never block when trying to free memory in order to
> > satisfy an allocation.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/nfs/file.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > index 9075769..61d3670 100644
> > --- a/fs/nfs/file.c
> > +++ b/fs/nfs/file.c
> > @@ -466,8 +466,8 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
> >  	    !(current->flags & PF_FSTRANS)) {
> >  		int how = FLUSH_SYNC;
> >  
> > -		/* Don't let kswapd deadlock waiting for OOM RPC calls */
> > -		if (current_is_kswapd())
> > +		/* Don't block if we're freeing for a memory allocation */
> > +		if (current->flags & PF_MEMALLOC)
> >  			how = 0;
> >  		nfs_commit_inode(mapping->host, how);
> >  	}
> 
> Umm... So which process will actually call nfs_release_page() with
> GFP_KERNEL, but without the PF_MEMALLOC flag being set?
> 

Well...there are a number of callers of try_to_release_page with
GFP_KERNEL that are not involved with reclaim:

The migrate_page codepaths and the splice code, for instance. Also
invalidate_complete_page2, which we call when invalidating an inode and
maybe also when truncating? Those are almost certainly less traveled
than the reclaim codepaths though.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-07-03 11:21 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 12:54 rpciod process is blocked in nfs_release_page waiting for nfs_commit_inode to complete Harshula
2012-06-15 13:21 ` Jeff Layton
2012-06-15 21:25   ` Myklebust, Trond
2012-06-20 14:14     ` Jeff Layton
2012-06-28 14:31       ` Mel Gorman
2012-06-28 14:55         ` Jeff Layton
2012-06-27 15:54     ` Jeff Layton
2012-06-27 18:43       ` Myklebust, Trond
2012-06-27 19:28         ` Jeff Layton
2012-06-27 19:37           ` Myklebust, Trond
2012-06-27 20:18             ` Jeff Layton
2012-06-27 19:56           ` Myklebust, Trond
2012-06-27 20:19             ` Jeff Layton
2012-06-27 20:46               ` Myklebust, Trond
2012-06-28 13:19                 ` Jeff Layton
2012-06-28 13:40                   ` Myklebust, Trond
2012-06-28 14:47                     ` Jeff Layton
2012-06-28 14:51                     ` [RFC PATCH] nfs: do non-blocking commit in releasepage if we're attempting to reconnect the socket Jeff Layton
2012-06-28 15:02                       ` Myklebust, Trond
2012-06-28 15:25                         ` [RFC PATCH v2] nfs: skip commit in releasepage if we're freeing memory for fs-related reasons Jeff Layton
2012-07-02 15:17                           ` Jeff Layton
2012-07-02 19:40                             ` Myklebust, Trond
2012-07-03 11:20                               ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.