On Wed, 16 Apr 2014 19:00:51 +1000 Dave Chinner wrote: > On Wed, Apr 16, 2014 at 04:49:41PM +1000, NeilBrown wrote: > > On Wed, 16 Apr 2014 16:25:20 +1000 Dave Chinner wrote: > > > > > On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote: > > > > __d_alloc can be called with i_mutex held, so it is safer to > > > > use GFP_NOFS. > > > > > > > > lockdep reports this can deadlock when loop-back NFS is in use, > > > > as nfsd may be required to write out for reclaim, and nfsd certainly > > > > takes i_mutex. > > > > > > But not the same i_mutex as is currently held. To me, this seems > > > like a false positive? If you are holding the i_mutex on an inode, > > > then you have a reference to the inode and hence memory reclaim > > > won't ever take the i_mutex on that inode. > > > > > > FWIW, this sort of false positive was a long stabding problem for > > > XFS - we managed to get rid of most of the false positives like this > > > by ensuring that only the ilock is taken within memory reclaim and > > > memory reclaim can't be entered while we hold the ilock. > > > > > > You can't do that with the i_mutex, though.... > > > > > > Cheers, > > > > > > Dave. > > > > I'm not sure this is a false positive. > > You can call __d_alloc when creating a file and so are holding i_mutex on the > > directory. > > nfsd might also want to access that directory. > > > > If there was only 1 nfsd thread, it would need to get i_mutex and do it's > > thing before replying to that request and so before it could handle the > > COMMIT which __d_alloc is waiting for. > > That seems wrong - the NFS client in __d_alloc holds a mutex on a > NFS client directory inode. The NFS server can't access that > specific mutex - it's on the other side of the "network". The NFS > server accesses mutexs from local filesystems, so __d_alloc would > have to be blocked on a local filesystem inode i_mutex for the nfsd > to get hung up behind it... I'm not thinking of mutexes on the NFS inodes but the local filesystem inodes exactly as you describe below. > > However, my confusion comes from the fact that we do GFP_KERNEL > memory allocation with the i_mutex held all over the place. Do we? Should we? Isn't the whole point of GFP_NOFS to use it when holding any filesystem lock? > If the > problem is: > > local fs access -> i_mutex > ..... > nfsd -> i_mutex (blocked) > ..... > local fs access -> kmalloc(GFP_KERNEL) > -> direct reclaim > -> nfs_release_page > -> > > > then why is it just __d_alloc that needs this fix? Either this is a > problem *everywhere* or it's not a problem at all. I think it is a problem everywhere that it is a problem :-) If you are holding an FS lock, then you should be using GFP_NOFS. Currently a given filesystem can get away with sometimes using GFP_KERNEL because that particular lock never causes contention during reclaim for that particular filesystem. Adding loop-back NFS into the mix broadens the number of locks which can cause a problem as it creates interdependencies between different filesystems. > > If it's a problem everywhere it means that we simply can't allow > reclaim from localhost NFS mounts to run from contexts that could > block an NFSD. i.e. you cannot run NFS client memory reclaim from > filesystems that are NFS server exported filesystems..... Well.. you cannot allow NFS client memory reclaim *while holding locks in* filesystems that are NFS exported. I think this is most effectively generalised to: you cannot allow FS memory reclaim while holding locks in filesystems which can be NFS exported which I think is largely the case already - and lockdep can help us find those places where we currently do allow FS reclaim while holding an FS lock. Thanks, NeilBrown