All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
Date: Wed, 16 Apr 2014 19:00:51 +1000	[thread overview]
Message-ID: <20140416090051.GK15995@dastard> (raw)
In-Reply-To: <20140416164941.37587da6@notabene.brown>

On Wed, Apr 16, 2014 at 04:49:41PM +1000, NeilBrown wrote:
> On Wed, 16 Apr 2014 16:25:20 +1000 Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> > > __d_alloc can be called with i_mutex held, so it is safer to
> > > use GFP_NOFS.
> > > 
> > > lockdep reports this can deadlock when loop-back NFS is in use,
> > > as nfsd may be required to write out for reclaim, and nfsd certainly
> > > takes i_mutex.
> > 
> > But not the same i_mutex as is currently held. To me, this seems
> > like a false positive? If you are holding the i_mutex on an inode,
> > then you have a reference to the inode and hence memory reclaim
> > won't ever take the i_mutex on that inode.
> > 
> > FWIW, this sort of false positive was a long stabding problem for
> > XFS - we managed to get rid of most of the false positives like this
> > by ensuring that only the ilock is taken within memory reclaim and
> > memory reclaim can't be entered while we hold the ilock.
> > 
> > You can't do that with the i_mutex, though....
> > 
> > Cheers,
> > 
> > Dave.
> 
> I'm not sure this is a false positive.
> You can call __d_alloc when creating a file and so are holding i_mutex on the
> directory.
> nfsd might also want to access that directory.
> 
> If there was only 1 nfsd thread, it would need to get i_mutex and do it's
> thing before replying to that request and so before it could handle the
> COMMIT which __d_alloc is waiting for.

That seems wrong - the NFS client in __d_alloc holds a mutex on a
NFS client directory inode. The NFS server can't access that
specific mutex - it's on the other side of the "network". The NFS
server accesses mutexs from local filesystems, so __d_alloc would
have to be blocked on a local filesystem inode i_mutex for the nfsd
to get hung up behind it...

However, my confusion comes from the fact that we do GFP_KERNEL
memory allocation with the i_mutex held all over the place. If the
problem is:

	local fs access -> i_mutex
.....
	nfsd -> i_mutex (blocked)
.....
	local fs access -> kmalloc(GFP_KERNEL)
			-> direct reclaim
			-> nfs_release_page
			-> <send write/commit request to blocked nfsds>
			   <deadlock>

then why is it just __d_alloc that needs this fix?  Either this is a
problem *everywhere* or it's not a problem at all.

If it's a problem everywhere it means that we simply can't allow
reclaim from localhost NFS mounts to run from contexts that could
block an NFSD. i.e. you cannot run NFS client memory reclaim from
filesystems that are NFS server exported filesystems.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
Date: Wed, 16 Apr 2014 19:00:51 +1000	[thread overview]
Message-ID: <20140416090051.GK15995@dastard> (raw)
In-Reply-To: <20140416164941.37587da6@notabene.brown>

On Wed, Apr 16, 2014 at 04:49:41PM +1000, NeilBrown wrote:
> On Wed, 16 Apr 2014 16:25:20 +1000 Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> > > __d_alloc can be called with i_mutex held, so it is safer to
> > > use GFP_NOFS.
> > > 
> > > lockdep reports this can deadlock when loop-back NFS is in use,
> > > as nfsd may be required to write out for reclaim, and nfsd certainly
> > > takes i_mutex.
> > 
> > But not the same i_mutex as is currently held. To me, this seems
> > like a false positive? If you are holding the i_mutex on an inode,
> > then you have a reference to the inode and hence memory reclaim
> > won't ever take the i_mutex on that inode.
> > 
> > FWIW, this sort of false positive was a long stabding problem for
> > XFS - we managed to get rid of most of the false positives like this
> > by ensuring that only the ilock is taken within memory reclaim and
> > memory reclaim can't be entered while we hold the ilock.
> > 
> > You can't do that with the i_mutex, though....
> > 
> > Cheers,
> > 
> > Dave.
> 
> I'm not sure this is a false positive.
> You can call __d_alloc when creating a file and so are holding i_mutex on the
> directory.
> nfsd might also want to access that directory.
> 
> If there was only 1 nfsd thread, it would need to get i_mutex and do it's
> thing before replying to that request and so before it could handle the
> COMMIT which __d_alloc is waiting for.

That seems wrong - the NFS client in __d_alloc holds a mutex on a
NFS client directory inode. The NFS server can't access that
specific mutex - it's on the other side of the "network". The NFS
server accesses mutexs from local filesystems, so __d_alloc would
have to be blocked on a local filesystem inode i_mutex for the nfsd
to get hung up behind it...

However, my confusion comes from the fact that we do GFP_KERNEL
memory allocation with the i_mutex held all over the place. If the
problem is:

	local fs access -> i_mutex
.....
	nfsd -> i_mutex (blocked)
.....
	local fs access -> kmalloc(GFP_KERNEL)
			-> direct reclaim
			-> nfs_release_page
			-> <send write/commit request to blocked nfsds>
			   <deadlock>

then why is it just __d_alloc that needs this fix?  Either this is a
problem *everywhere* or it's not a problem at all.

If it's a problem everywhere it means that we simply can't allow
reclaim from localhost NFS mounts to run from contexts that could
block an NFSD. i.e. you cannot run NFS client memory reclaim from
filesystems that are NFS server exported filesystems.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
Date: Wed, 16 Apr 2014 19:00:51 +1000	[thread overview]
Message-ID: <20140416090051.GK15995@dastard> (raw)
In-Reply-To: <20140416164941.37587da6@notabene.brown>

On Wed, Apr 16, 2014 at 04:49:41PM +1000, NeilBrown wrote:
> On Wed, 16 Apr 2014 16:25:20 +1000 Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> > > __d_alloc can be called with i_mutex held, so it is safer to
> > > use GFP_NOFS.
> > > 
> > > lockdep reports this can deadlock when loop-back NFS is in use,
> > > as nfsd may be required to write out for reclaim, and nfsd certainly
> > > takes i_mutex.
> > 
> > But not the same i_mutex as is currently held. To me, this seems
> > like a false positive? If you are holding the i_mutex on an inode,
> > then you have a reference to the inode and hence memory reclaim
> > won't ever take the i_mutex on that inode.
> > 
> > FWIW, this sort of false positive was a long stabding problem for
> > XFS - we managed to get rid of most of the false positives like this
> > by ensuring that only the ilock is taken within memory reclaim and
> > memory reclaim can't be entered while we hold the ilock.
> > 
> > You can't do that with the i_mutex, though....
> > 
> > Cheers,
> > 
> > Dave.
> 
> I'm not sure this is a false positive.
> You can call __d_alloc when creating a file and so are holding i_mutex on the
> directory.
> nfsd might also want to access that directory.
> 
> If there was only 1 nfsd thread, it would need to get i_mutex and do it's
> thing before replying to that request and so before it could handle the
> COMMIT which __d_alloc is waiting for.

That seems wrong - the NFS client in __d_alloc holds a mutex on a
NFS client directory inode. The NFS server can't access that
specific mutex - it's on the other side of the "network". The NFS
server accesses mutexs from local filesystems, so __d_alloc would
have to be blocked on a local filesystem inode i_mutex for the nfsd
to get hung up behind it...

However, my confusion comes from the fact that we do GFP_KERNEL
memory allocation with the i_mutex held all over the place. If the
problem is:

	local fs access -> i_mutex
.....
	nfsd -> i_mutex (blocked)
.....
	local fs access -> kmalloc(GFP_KERNEL)
			-> direct reclaim
			-> nfs_release_page
			-> <send write/commit request to blocked nfsds>
			   <deadlock>

then why is it just __d_alloc that needs this fix?  Either this is a
problem *everywhere* or it's not a problem at all.

If it's a problem everywhere it means that we simply can't allow
reclaim from localhost NFS mounts to run from contexts that could
block an NFSD. i.e. you cannot run NFS client memory reclaim from
filesystems that are NFS server exported filesystems.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-04-16  9:00 UTC|newest]

Thread overview: 151+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-16  4:03 [PATCH/RFC 00/19] Support loop-back NFS mounts NeilBrown
2014-04-16  4:03 ` NeilBrown
2014-04-16  4:03 ` NeilBrown
2014-04-16  4:03 ` NeilBrown
2014-04-16  4:03 ` [PATCH 03/19] lockdep: improve scenario messages for RECLAIM_FS errors NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  7:22   ` Peter Zijlstra
2014-04-16  7:22     ` Peter Zijlstra
2014-04-16  7:22     ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 06/19] nfsd: set PF_FSTRANS for nfsd threads NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  7:28   ` Peter Zijlstra
2014-04-16  7:28     ` Peter Zijlstra
2014-04-16  7:28     ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 14/19] driver core: set PF_FSTRANS while holding gdp_mutex NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 07/19] nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 11/19] FS: set PF_FSTRANS while holding mmap_sem in exec.c NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 01/19] Promote current_{set, restore}_flags_nested from xfs to global NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 04/19] Make effect of PF_FSTRANS to disable __GFP_FS universal NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:37   ` Dave Chinner
2014-04-16  5:37     ` Dave Chinner
2014-04-16  5:37     ` Dave Chinner
2014-04-16  6:17     ` NeilBrown
2014-04-16  6:17       ` NeilBrown
2014-04-17  1:03       ` NeilBrown
2014-04-17  1:03         ` NeilBrown
2014-04-17  4:41         ` Dave Chinner
2014-04-17  4:41           ` Dave Chinner
2014-04-17  4:41           ` Dave Chinner
2014-04-16  4:03 ` [PATCH 09/19] XFS: ensure xfs_file_*_read cannot deadlock in memory allocation NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:04   ` Dave Chinner
2014-04-16  6:04     ` Dave Chinner
2014-04-16  6:04     ` Dave Chinner
2014-04-16  6:27     ` NeilBrown
2014-04-16  6:27       ` NeilBrown
2014-04-16  6:31     ` Dave Chinner
2014-04-16  6:31       ` Dave Chinner
2014-04-16  6:31       ` Dave Chinner
2014-04-16  4:03 ` [PATCH 05/19] SUNRPC: track whether a request is coming from a loop-back interface NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16 14:47   ` Jeff Layton
2014-04-16 14:47     ` Jeff Layton
2014-04-16 14:47     ` Jeff Layton
2014-04-16 23:25     ` NeilBrown
2014-04-16 23:25       ` NeilBrown
2014-04-16  4:03 ` [PATCH 02/19] lockdep: lockdep_set_current_reclaim_state should save old value NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:13   ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:47     ` NeilBrown
2014-04-16  5:47       ` NeilBrown
2014-04-16  5:47       ` NeilBrown
2014-04-16 13:00     ` David Miller
2014-04-16 13:00       ` David Miller
2014-04-16 13:00       ` David Miller
2014-04-17  2:38       ` NeilBrown
2014-04-17  2:38         ` NeilBrown
2014-04-16  4:03 ` [PATCH 13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:49   ` Dave Chinner
2014-04-16  5:49     ` Dave Chinner
2014-04-16  5:49     ` Dave Chinner
2014-04-16  6:22     ` NeilBrown
2014-04-16  6:22       ` NeilBrown
2014-04-16  6:30       ` Dave Chinner
2014-04-16  6:30         ` Dave Chinner
2014-04-16  6:30         ` Dave Chinner
2014-04-16  4:03 ` [PATCH 12/19] NET: set PF_FSTRANS while holding rtnl_lock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 08/19] Set PF_FSTRANS while write_cache_pages calls ->writepage NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 18/19] nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem is held NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:46   ` Al Viro
2014-04-16  4:46     ` Al Viro
2014-04-16  5:52     ` NeilBrown
2014-04-16  5:52       ` NeilBrown
2014-04-16 16:37       ` Al Viro
2014-04-16 16:37         ` Al Viro
2014-04-16 16:37         ` Al Viro
2014-04-16  4:03 ` [PATCH 15/19] nfsd: set PF_FSTRANS when client_mutex " NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:25   ` Dave Chinner
2014-04-16  6:25     ` Dave Chinner
2014-04-16  6:25     ` Dave Chinner
2014-04-16  6:49     ` NeilBrown
2014-04-16  6:49       ` NeilBrown
2014-04-16  9:00       ` Dave Chinner [this message]
2014-04-16  9:00         ` Dave Chinner
2014-04-16  9:00         ` Dave Chinner
2014-04-17  0:51         ` NeilBrown
2014-04-17  0:51           ` NeilBrown
2014-04-17  5:58           ` Dave Chinner
2014-04-17  5:58             ` Dave Chinner
2014-04-17  5:58             ` Dave Chinner
2014-04-16  4:03 ` [PATCH 19/19] XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:18   ` Dave Chinner
2014-04-16  6:18     ` Dave Chinner
2014-04-16  6:18     ` Dave Chinner
2014-04-16 14:42 ` [PATCH/RFC 00/19] Support loop-back NFS mounts Jeff Layton
2014-04-16 14:42   ` Jeff Layton
2014-04-16 14:42   ` Jeff Layton
2014-04-17  0:20   ` NeilBrown
2014-04-17  0:20     ` NeilBrown
2014-04-17  0:20     ` NeilBrown
2014-04-17  1:27     ` Dave Chinner
2014-04-17  1:27       ` Dave Chinner
2014-04-17  1:27       ` Dave Chinner
2014-04-17  1:50       ` NeilBrown
2014-04-17  1:50         ` NeilBrown
2014-04-17  1:50         ` NeilBrown
2014-04-17  4:23         ` Dave Chinner
2014-04-17  4:23           ` Dave Chinner
2014-04-17  4:23           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140416090051.GK15995@dastard \
    --to=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.