On Mon, 25 Aug 2014 16:05:01 +1000 NeilBrown wrote: > On Fri, 22 Aug 2014 15:55:30 +0800 Junxiao Bi wrote: > > > Hi All, > > > > I got an nfs hung issue, looks like "rpciod" run into deadlock. Bug is > > reported on 2.6.32, but seems mainline also suffers this bug from the > > source code. > > > > See the following rpciod trace. rpciod allocated memory using GFP_KERNEL > > in xs_setup_xprt(). That triggered direct reclaim when available memory > > was not enough, where it waited an write-back page done, but that page > > was a nfs page, and it depended on rpciod to write back. So this caused > > a deadlock. > > > > I am not sure how to fix this issue. Replace GFP_KERNEL with GFP_NOFS in > > xs_setup_xprt() can fix this trace, but there are other place allocating > > memory with GFP_KERNEL in rpciod, like > > xs_tcp_setup_socket()->xs_create_sock()->__sock_create()->sock_alloc(), > > there is no way to pass GFP_NOFS to network command code. Also mainline > > has changed to not care ___GFP_FS before waiting page write back done. > > Upstream commit 5cf02d0 (nfs: skip commit in releasepage if we're > > freeing memory for fs-related reasons) uses PF_FSTRANS to avoid another > > deadlock when direct reclaim, i am thinking whether we can check > > PF_FSTRANS flag in shrink_page_list(), if this flag is set, it will not > > wait any page write back done? I saw this flag is also used by xfs, not > > sure whether this will affect xfs. > > > > Any advices is appreciated. > > This problem shouldn't affect mainline. > > Since Linux 3.2, "direct reclaim" never wait for writeback - that is left for > kswapd to do. (See "A pivotal patch" in https://lwn.net/Articles/595652/) > So this deadlock cannot happen. Sorry, that might not quite be right. That change meant that direct reclaim would never *initiate* writeout. It can sometimes wait for it. Sorry. NeilBrown > > Probably the simplest fix for your deadlock would be: > - in shrink_page_list, clear may_enter_fs if PF_FSTRANS is set. > - in rpc_async_schedule, set PF_FSTRANS before calling __rpc_execute, and > clear it again afterwards. > > NeilBrown >