On Mon, 25 Aug 2014 16:05:01 +1000 NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:

> On Fri, 22 Aug 2014 15:55:30 +0800 Junxiao Bi <junxiao.bi-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > Hi All,
> > 
> > I got an nfs hung issue, looks like "rpciod" run into deadlock. Bug is
> > reported on 2.6.32, but seems mainline also suffers this bug from the
> > source code.
> > 
> > See the following rpciod trace. rpciod allocated memory using GFP_KERNEL
> > in xs_setup_xprt(). That triggered direct reclaim when available memory
> > was not enough, where it waited an write-back page done, but that page
> > was a nfs page, and it depended on rpciod to write back. So this caused
> > a deadlock.
> > 
> > I am not sure how to fix this issue. Replace GFP_KERNEL with GFP_NOFS in
> > xs_setup_xprt() can fix this trace, but there are other place allocating
> > memory with GFP_KERNEL in rpciod, like
> > xs_tcp_setup_socket()->xs_create_sock()->__sock_create()->sock_alloc(),
> > there is no way to pass GFP_NOFS to network command code. Also mainline
> > has changed to not care ___GFP_FS before waiting page write back done.
> > Upstream commit 5cf02d0 (nfs: skip commit in releasepage if we're
> > freeing memory for fs-related reasons) uses PF_FSTRANS to avoid another
> > deadlock when direct reclaim, i am thinking whether we can check
> > PF_FSTRANS flag in shrink_page_list(), if this flag is set, it will not
> > wait any page write back done? I saw this flag is also used by xfs, not
> > sure whether this will affect xfs.
> > 
> > Any advices is appreciated.
> 
> This problem shouldn't affect mainline.
> 
> Since Linux 3.2, "direct reclaim" never wait for writeback - that is left for
> kswapd to do. (See "A pivotal patch" in https://lwn.net/Articles/595652/)
> So this deadlock cannot happen.

Sorry, that might not quite be right.  That change meant that direct reclaim
would never *initiate* writeout.  It can sometimes wait for it.
Sorry.

NeilBrown

> 
> Probably the simplest fix for your deadlock would be:
> - in shrink_page_list, clear may_enter_fs if PF_FSTRANS is set.
> - in rpc_async_schedule, set PF_FSTRANS before calling __rpc_execute, and
>   clear it again afterwards.
> 
> NeilBrown
>