* [PATCH 0/2] xfs: fix a couple of potential deadlocks @ 2018-06-07 5:21 Dave Chinner 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner 2018-06-07 5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner 0 siblings, 2 replies; 13+ messages in thread From: Dave Chinner @ 2018-06-07 5:21 UTC (permalink / raw) To: linux-xfs Hi folks, These are a couple of small fixes for lockdep enabled kernels. The first changes the initialisation of the i_rwsem lockdep state in the XFS code instead of in unlock_new_inode() to avoid lockdep re-initialising the lock state after it can be found in the cache and may have other processes waiting on the lock. The second is adding the correct memory allocation context to xfs_reflink_convert_cow() as it gets called in the IO path where we hold pages locked for IO and so we can't recurse back into memory reclaim. Cheers, Dave. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly 2018-06-07 5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner @ 2018-06-07 5:21 ` Dave Chinner 2018-06-07 5:32 ` Dave Chinner ` (2 more replies) 2018-06-07 5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner 1 sibling, 3 replies; 13+ messages in thread From: Dave Chinner @ 2018-06-07 5:21 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> When lockdep is enabled, it changes the type of the inode i_rwsem semaphore before unlocking a newly instantiated inode. THere is the possibility that there is already a waiter on that inode lock by the time we unlock the new inode, so having lockdep re-initialise the lock is a vector for trouble. Avoid this whole situation by setting up the i_rwsem lockdep class at the same time we set up the XFS inode i_ilock classes and so the VFS doesn't have to change the lock class itself when it is potentially unsafe. Signed-Off-By: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_iops.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 29484091c0d2..3020c57fc125 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1258,6 +1258,14 @@ xfs_setup_inode( xfs_diflags_to_iflags(inode, ip); if (S_ISDIR(inode->i_mode)) { + /* + * We set the i_rwsem class here to avoid potential races with + * lockdep_annotate_inode_mutex_key() reinitialising the lock + * after a filehandle lookup has already found the inode in + * cache before it has been unlocked via unlock_new_inode(). + */ + lockdep_set_class(&inode->i_rwsem, + &inode->i_sb->s_type->i_mutex_dir_key); lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class); ip->d_ops = ip->i_mount->m_dir_inode_ops; } else { -- 2.17.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner @ 2018-06-07 5:32 ` Dave Chinner 2018-06-07 11:41 ` Brian Foster 2018-06-07 5:50 ` Allison Henderson 2018-06-07 14:53 ` Darrick J. Wong 2 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2018-06-07 5:32 UTC (permalink / raw) To: linux-xfs On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When lockdep is enabled, it changes the type of the inode i_rwsem > semaphore before unlocking a newly instantiated inode. THere is the > possibility that there is already a waiter on that inode lock by the > time we unlock the new inode, so having lockdep re-initialise the > lock is a vector for trouble. > > Avoid this whole situation by setting up the i_rwsem lockdep class > at the same time we set up the XFS inode i_ilock classes and so the > VFS doesn't have to change the lock class itself when it is > potentially unsafe. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> I just realised that the VFS equivalent patch has made it upstream, too, which would help explain this a bit more. Darrick, can you add this to the commit message: "This change is necessary because the equivalent fixes to the VFS code made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode combinations safely") are not relevant to XFS as it has it's own internal inode cache lookup and instantiation routines." Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly 2018-06-07 5:32 ` Dave Chinner @ 2018-06-07 11:41 ` Brian Foster 0 siblings, 0 replies; 13+ messages in thread From: Brian Foster @ 2018-06-07 11:41 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jun 07, 2018 at 03:32:36PM +1000, Dave Chinner wrote: > On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > When lockdep is enabled, it changes the type of the inode i_rwsem > > semaphore before unlocking a newly instantiated inode. THere is the > > possibility that there is already a waiter on that inode lock by the > > time we unlock the new inode, so having lockdep re-initialise the > > lock is a vector for trouble. > > > > Avoid this whole situation by setting up the i_rwsem lockdep class > > at the same time we set up the XFS inode i_ilock classes and so the > > VFS doesn't have to change the lock class itself when it is > > potentially unsafe. > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > I just realised that the VFS equivalent patch has made it upstream, > too, which would help explain this a bit more. Darrick, can you add > this to the commit message: > > "This change is necessary because the equivalent fixes to the VFS > code made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode > combinations safely") are not relevant to XFS as it has it's own > internal inode cache lookup and instantiation routines." > The reference definitely helps, thanks. With that added: Reviewed-by: Brian Foster <bfoster@redhat.com> > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner 2018-06-07 5:32 ` Dave Chinner @ 2018-06-07 5:50 ` Allison Henderson 2018-06-07 14:53 ` Darrick J. Wong 2 siblings, 0 replies; 13+ messages in thread From: Allison Henderson @ 2018-06-07 5:50 UTC (permalink / raw) To: Dave Chinner, linux-xfs On 06/06/2018 10:21 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When lockdep is enabled, it changes the type of the inode i_rwsem > semaphore before unlocking a newly instantiated inode. THere is the > possibility that there is already a waiter on that inode lock by the > time we unlock the new inode, so having lockdep re-initialise the > lock is a vector for trouble. > > Avoid this whole situation by setting up the i_rwsem lockdep class > at the same time we set up the XFS inode i_ilock classes and so the > VFS doesn't have to change the lock class itself when it is > potentially unsafe. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_iops.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 29484091c0d2..3020c57fc125 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -1258,6 +1258,14 @@ xfs_setup_inode( > xfs_diflags_to_iflags(inode, ip); > > if (S_ISDIR(inode->i_mode)) { > + /* > + * We set the i_rwsem class here to avoid potential races with > + * lockdep_annotate_inode_mutex_key() reinitialising the lock > + * after a filehandle lookup has already found the inode in > + * cache before it has been unlocked via unlock_new_inode(). > + */ > + lockdep_set_class(&inode->i_rwsem, > + &inode->i_sb->s_type->i_mutex_dir_key); > lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class); > ip->d_ops = ip->i_mount->m_dir_inode_ops; > } else { > Ok, you can add my review: Reviewed-by: Allison Henderson <allison.henderson@oracle.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner 2018-06-07 5:32 ` Dave Chinner 2018-06-07 5:50 ` Allison Henderson @ 2018-06-07 14:53 ` Darrick J. Wong 2 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2018-06-07 14:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > When lockdep is enabled, it changes the type of the inode i_rwsem > semaphore before unlocking a newly instantiated inode. THere is the > possibility that there is already a waiter on that inode lock by the > time we unlock the new inode, so having lockdep re-initialise the > lock is a vector for trouble. > > Avoid this whole situation by setting up the i_rwsem lockdep class > at the same time we set up the XFS inode i_ilock classes and so the > VFS doesn't have to change the lock class itself when it is > potentially unsafe. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> With the commit message changes added, Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/xfs/xfs_iops.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 29484091c0d2..3020c57fc125 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -1258,6 +1258,14 @@ xfs_setup_inode( > xfs_diflags_to_iflags(inode, ip); > > if (S_ISDIR(inode->i_mode)) { > + /* > + * We set the i_rwsem class here to avoid potential races with > + * lockdep_annotate_inode_mutex_key() reinitialising the lock > + * after a filehandle lookup has already found the inode in > + * cache before it has been unlocked via unlock_new_inode(). > + */ > + lockdep_set_class(&inode->i_rwsem, > + &inode->i_sb->s_type->i_mutex_dir_key); > lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class); > ip->d_ops = ip->i_mount->m_dir_inode_ops; > } else { > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner @ 2018-06-07 5:21 ` Dave Chinner 2018-06-07 5:56 ` Allison Henderson 2018-06-07 11:41 ` Brian Foster 1 sibling, 2 replies; 13+ messages in thread From: Dave Chinner @ 2018-06-07 5:21 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> xfs_reflink_convert_cow() manipulates the incore extent list in GFP_KERNEL context in the IO submission path whilst holding locked pages under writeback. This is a memory reclaim deadlock vector. This code is not in a transaction, so any memory allocations it makes aren't protected via the memalloc_nofs_save() context that transactions carry. Hence we need to run this call under memalloc_nofs_save() context to prevent potential memory allocations from being run as GFP_KERNEL and deadlocking. Signed-Off-By: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_aops.c | 11 +++++++++++ fs/xfs/xfs_buf.c | 1 - fs/xfs/xfs_linux.h | 1 + 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 767d53222f31..1eb625fdcb1e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -531,8 +531,19 @@ xfs_submit_ioend( { /* Convert CoW extents to regular */ if (!status && ioend->io_type == XFS_IO_COW) { + /* + * Yuk. This can do memory allocation, but is not a + * transactional operation so everything is done in GFP_KERNEL + * context. That can deadlock, because we hold pages in + * writeback state and GFP_KERNEL allocations can block on them. + * Hence we must operate in nofs conditions here. + */ + unsigned nofs_flag; + + nofs_flag = memalloc_nofs_save(); status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), ioend->io_offset, ioend->io_size); + memalloc_nofs_restore(nofs_flag); } /* Reserve log space if we might write beyond the on-disk inode size. */ diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 980bc48979e9..e9c058e3761c 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -21,7 +21,6 @@ #include <linux/migrate.h> #include <linux/backing-dev.h> #include <linux/freezer.h> -#include <linux/sched/mm.h> #include "xfs_format.h" #include "xfs_log_format.h" diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index ae1e66fa3f61..1631cf4546f2 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; #include <linux/semaphore.h> #include <linux/mm.h> +#include <linux/sched/mm.h> #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/slab.h> -- 2.17.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner @ 2018-06-07 5:56 ` Allison Henderson 2018-06-07 14:46 ` Darrick J. Wong 2018-06-07 11:41 ` Brian Foster 1 sibling, 1 reply; 13+ messages in thread From: Allison Henderson @ 2018-06-07 5:56 UTC (permalink / raw) To: Dave Chinner, linux-xfs On 06/06/2018 10:21 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_reflink_convert_cow() manipulates the incore extent list > in GFP_KERNEL context in the IO submission path whilst holding > locked pages under writeback. This is a memory reclaim deadlock > vector. This code is not in a transaction, so any memory allocations > it makes aren't protected via the memalloc_nofs_save() context that > transactions carry. > > Hence we need to run this call under memalloc_nofs_save() context to > prevent potential memory allocations from being run as GFP_KERNEL > and deadlocking. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_aops.c | 11 +++++++++++ > fs/xfs/xfs_buf.c | 1 - > fs/xfs/xfs_linux.h | 1 + > 3 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 767d53222f31..1eb625fdcb1e 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -531,8 +531,19 @@ xfs_submit_ioend( > { > /* Convert CoW extents to regular */ > if (!status && ioend->io_type == XFS_IO_COW) { > + /* > + * Yuk. This can do memory allocation, but is not a > + * transactional operation so everything is done in GFP_KERNEL > + * context. That can deadlock, because we hold pages in > + * writeback state and GFP_KERNEL allocations can block on them. > + * Hence we must operate in nofs conditions here. > + */ > + unsigned nofs_flag; > + > + nofs_flag = memalloc_nofs_save(); > status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), > ioend->io_offset, ioend->io_size); > + memalloc_nofs_restore(nofs_flag); > } > > /* Reserve log space if we might write beyond the on-disk inode size. */ > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 980bc48979e9..e9c058e3761c 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -21,7 +21,6 @@ > #include <linux/migrate.h> > #include <linux/backing-dev.h> > #include <linux/freezer.h> > -#include <linux/sched/mm.h> > > #include "xfs_format.h" > #include "xfs_log_format.h" > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index ae1e66fa3f61..1631cf4546f2 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; > > #include <linux/semaphore.h> > #include <linux/mm.h> > +#include <linux/sched/mm.h> > #include <linux/kernel.h> > #include <linux/blkdev.h> > #include <linux/slab.h> > Looks, ok. Was moving the header include intentional? Just clean up maybe? Other than that, looks good. Reviewed-by: Allison Henderson <allison.henderson@oracle.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 5:56 ` Allison Henderson @ 2018-06-07 14:46 ` Darrick J. Wong 2018-06-07 22:08 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2018-06-07 14:46 UTC (permalink / raw) To: Allison Henderson; +Cc: Dave Chinner, linux-xfs On Wed, Jun 06, 2018 at 10:56:50PM -0700, Allison Henderson wrote: > On 06/06/2018 10:21 PM, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > xfs_reflink_convert_cow() manipulates the incore extent list > > in GFP_KERNEL context in the IO submission path whilst holding > > locked pages under writeback. This is a memory reclaim deadlock > > vector. This code is not in a transaction, so any memory allocations > > it makes aren't protected via the memalloc_nofs_save() context that > > transactions carry. > > > > Hence we need to run this call under memalloc_nofs_save() context to > > prevent potential memory allocations from being run as GFP_KERNEL > > and deadlocking. > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_aops.c | 11 +++++++++++ > > fs/xfs/xfs_buf.c | 1 - > > fs/xfs/xfs_linux.h | 1 + > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index 767d53222f31..1eb625fdcb1e 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -531,8 +531,19 @@ xfs_submit_ioend( > > { > > /* Convert CoW extents to regular */ > > if (!status && ioend->io_type == XFS_IO_COW) { > > + /* > > + * Yuk. This can do memory allocation, but is not a > > + * transactional operation so everything is done in GFP_KERNEL > > + * context. That can deadlock, because we hold pages in > > + * writeback state and GFP_KERNEL allocations can block on them. > > + * Hence we must operate in nofs conditions here. > > + */ > > + unsigned nofs_flag; > > + > > + nofs_flag = memalloc_nofs_save(); > > status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), > > ioend->io_offset, ioend->io_size); > > + memalloc_nofs_restore(nofs_flag); DOH. :) > > } > > /* Reserve log space if we might write beyond the on-disk inode size. */ > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 980bc48979e9..e9c058e3761c 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -21,7 +21,6 @@ > > #include <linux/migrate.h> > > #include <linux/backing-dev.h> > > #include <linux/freezer.h> > > -#include <linux/sched/mm.h> > > #include "xfs_format.h" > > #include "xfs_log_format.h" > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > > index ae1e66fa3f61..1631cf4546f2 100644 > > --- a/fs/xfs/xfs_linux.h > > +++ b/fs/xfs/xfs_linux.h > > @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; > > #include <linux/semaphore.h> > > #include <linux/mm.h> > > +#include <linux/sched/mm.h> > > #include <linux/kernel.h> > > #include <linux/blkdev.h> > > #include <linux/slab.h> > > > Looks, ok. Was moving the header include intentional? Just clean up maybe? > Other than that, looks good. I can't speak for Dave, but I'll point out that memalloc_nofs_restore is declared in linux/sched/mm.h, so the #include hoist makes the symbol available to the aops code in such a manner that now it's available to all the xfs code so that we don't have to remember this... Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > > Reviewed-by: Allison Henderson <allison.henderson@oracle.com> > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 14:46 ` Darrick J. Wong @ 2018-06-07 22:08 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2018-06-07 22:08 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Allison Henderson, linux-xfs On Thu, Jun 07, 2018 at 07:46:31AM -0700, Darrick J. Wong wrote: > On Wed, Jun 06, 2018 at 10:56:50PM -0700, Allison Henderson wrote: > > On 06/06/2018 10:21 PM, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > xfs_reflink_convert_cow() manipulates the incore extent list > > > in GFP_KERNEL context in the IO submission path whilst holding > > > locked pages under writeback. This is a memory reclaim deadlock > > > vector. This code is not in a transaction, so any memory allocations > > > it makes aren't protected via the memalloc_nofs_save() context that > > > transactions carry. > > > > > > Hence we need to run this call under memalloc_nofs_save() context to > > > prevent potential memory allocations from being run as GFP_KERNEL > > > and deadlocking. > > > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > > --- > > > fs/xfs/xfs_aops.c | 11 +++++++++++ > > > fs/xfs/xfs_buf.c | 1 - > > > fs/xfs/xfs_linux.h | 1 + > > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > index 767d53222f31..1eb625fdcb1e 100644 > > > --- a/fs/xfs/xfs_aops.c > > > +++ b/fs/xfs/xfs_aops.c > > > @@ -531,8 +531,19 @@ xfs_submit_ioend( > > > { > > > /* Convert CoW extents to regular */ > > > if (!status && ioend->io_type == XFS_IO_COW) { > > > + /* > > > + * Yuk. This can do memory allocation, but is not a > > > + * transactional operation so everything is done in GFP_KERNEL > > > + * context. That can deadlock, because we hold pages in > > > + * writeback state and GFP_KERNEL allocations can block on them. > > > + * Hence we must operate in nofs conditions here. > > > + */ > > > + unsigned nofs_flag; > > > + > > > + nofs_flag = memalloc_nofs_save(); > > > status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), > > > ioend->io_offset, ioend->io_size); > > > + memalloc_nofs_restore(nofs_flag); > > DOH. :) > > > > } > > > /* Reserve log space if we might write beyond the on-disk inode size. */ > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > > index 980bc48979e9..e9c058e3761c 100644 > > > --- a/fs/xfs/xfs_buf.c > > > +++ b/fs/xfs/xfs_buf.c > > > @@ -21,7 +21,6 @@ > > > #include <linux/migrate.h> > > > #include <linux/backing-dev.h> > > > #include <linux/freezer.h> > > > -#include <linux/sched/mm.h> > > > #include "xfs_format.h" > > > #include "xfs_log_format.h" > > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > > > index ae1e66fa3f61..1631cf4546f2 100644 > > > --- a/fs/xfs/xfs_linux.h > > > +++ b/fs/xfs/xfs_linux.h > > > @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; > > > #include <linux/semaphore.h> > > > #include <linux/mm.h> > > > +#include <linux/sched/mm.h> > > > #include <linux/kernel.h> > > > #include <linux/blkdev.h> > > > #include <linux/slab.h> > > > > > Looks, ok. Was moving the header include intentional? Just clean up maybe? > > Other than that, looks good. > > I can't speak for Dave, but I'll point out that memalloc_nofs_restore is > declared in linux/sched/mm.h, so the #include hoist makes the symbol > available to the aops code in such a manner that now it's available to > all the xfs code so that we don't have to remember this... *nod* That's historically how we've handled OS level includes - it keeps the XFS files to including XFS headers only and that makes things like userspace code syncs a lot easier. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner 2018-06-07 5:56 ` Allison Henderson @ 2018-06-07 11:41 ` Brian Foster 2018-06-07 14:48 ` Darrick J. Wong 2018-06-08 0:48 ` Dave Chinner 1 sibling, 2 replies; 13+ messages in thread From: Brian Foster @ 2018-06-07 11:41 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > xfs_reflink_convert_cow() manipulates the incore extent list > in GFP_KERNEL context in the IO submission path whilst holding > locked pages under writeback. This is a memory reclaim deadlock > vector. This code is not in a transaction, so any memory allocations > it makes aren't protected via the memalloc_nofs_save() context that > transactions carry. > > Hence we need to run this call under memalloc_nofs_save() context to > prevent potential memory allocations from being run as GFP_KERNEL > and deadlocking. > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > --- Looks fine modulo the header thing Allison already pointed out: Reviewed-by: Brian Foster <bfoster@redhat.com> BTW, shouldn't we also be using XFS_TRANS_NOFS in xfs_iomap_write_allocate()? Brian > fs/xfs/xfs_aops.c | 11 +++++++++++ > fs/xfs/xfs_buf.c | 1 - > fs/xfs/xfs_linux.h | 1 + > 3 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 767d53222f31..1eb625fdcb1e 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -531,8 +531,19 @@ xfs_submit_ioend( > { > /* Convert CoW extents to regular */ > if (!status && ioend->io_type == XFS_IO_COW) { > + /* > + * Yuk. This can do memory allocation, but is not a > + * transactional operation so everything is done in GFP_KERNEL > + * context. That can deadlock, because we hold pages in > + * writeback state and GFP_KERNEL allocations can block on them. > + * Hence we must operate in nofs conditions here. > + */ > + unsigned nofs_flag; > + > + nofs_flag = memalloc_nofs_save(); > status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), > ioend->io_offset, ioend->io_size); > + memalloc_nofs_restore(nofs_flag); > } > > /* Reserve log space if we might write beyond the on-disk inode size. */ > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 980bc48979e9..e9c058e3761c 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -21,7 +21,6 @@ > #include <linux/migrate.h> > #include <linux/backing-dev.h> > #include <linux/freezer.h> > -#include <linux/sched/mm.h> > > #include "xfs_format.h" > #include "xfs_log_format.h" > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index ae1e66fa3f61..1631cf4546f2 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; > > #include <linux/semaphore.h> > #include <linux/mm.h> > +#include <linux/sched/mm.h> > #include <linux/kernel.h> > #include <linux/blkdev.h> > #include <linux/slab.h> > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 11:41 ` Brian Foster @ 2018-06-07 14:48 ` Darrick J. Wong 2018-06-08 0:48 ` Dave Chinner 1 sibling, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2018-06-07 14:48 UTC (permalink / raw) To: Brian Foster; +Cc: Dave Chinner, linux-xfs On Thu, Jun 07, 2018 at 07:41:40AM -0400, Brian Foster wrote: > On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > xfs_reflink_convert_cow() manipulates the incore extent list > > in GFP_KERNEL context in the IO submission path whilst holding > > locked pages under writeback. This is a memory reclaim deadlock > > vector. This code is not in a transaction, so any memory allocations > > it makes aren't protected via the memalloc_nofs_save() context that > > transactions carry. > > > > Hence we need to run this call under memalloc_nofs_save() context to > > prevent potential memory allocations from being run as GFP_KERNEL > > and deadlocking. > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > --- > > Looks fine modulo the header thing Allison already pointed out: > > Reviewed-by: Brian Foster <bfoster@redhat.com> > > BTW, shouldn't we also be using XFS_TRANS_NOFS in > xfs_iomap_write_allocate()? /me squints at the usage and thinks ... yes probably. :) --D > Brian > > > fs/xfs/xfs_aops.c | 11 +++++++++++ > > fs/xfs/xfs_buf.c | 1 - > > fs/xfs/xfs_linux.h | 1 + > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index 767d53222f31..1eb625fdcb1e 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -531,8 +531,19 @@ xfs_submit_ioend( > > { > > /* Convert CoW extents to regular */ > > if (!status && ioend->io_type == XFS_IO_COW) { > > + /* > > + * Yuk. This can do memory allocation, but is not a > > + * transactional operation so everything is done in GFP_KERNEL > > + * context. That can deadlock, because we hold pages in > > + * writeback state and GFP_KERNEL allocations can block on them. > > + * Hence we must operate in nofs conditions here. > > + */ > > + unsigned nofs_flag; > > + > > + nofs_flag = memalloc_nofs_save(); > > status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode), > > ioend->io_offset, ioend->io_size); > > + memalloc_nofs_restore(nofs_flag); > > } > > > > /* Reserve log space if we might write beyond the on-disk inode size. */ > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 980bc48979e9..e9c058e3761c 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -21,7 +21,6 @@ > > #include <linux/migrate.h> > > #include <linux/backing-dev.h> > > #include <linux/freezer.h> > > -#include <linux/sched/mm.h> > > > > #include "xfs_format.h" > > #include "xfs_log_format.h" > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > > index ae1e66fa3f61..1631cf4546f2 100644 > > --- a/fs/xfs/xfs_linux.h > > +++ b/fs/xfs/xfs_linux.h > > @@ -26,6 +26,7 @@ typedef __u32 xfs_nlink_t; > > > > #include <linux/semaphore.h> > > #include <linux/mm.h> > > +#include <linux/sched/mm.h> > > #include <linux/kernel.h> > > #include <linux/blkdev.h> > > #include <linux/slab.h> > > -- > > 2.17.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock 2018-06-07 11:41 ` Brian Foster 2018-06-07 14:48 ` Darrick J. Wong @ 2018-06-08 0:48 ` Dave Chinner 1 sibling, 0 replies; 13+ messages in thread From: Dave Chinner @ 2018-06-08 0:48 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Thu, Jun 07, 2018 at 07:41:40AM -0400, Brian Foster wrote: > On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > xfs_reflink_convert_cow() manipulates the incore extent list > > in GFP_KERNEL context in the IO submission path whilst holding > > locked pages under writeback. This is a memory reclaim deadlock > > vector. This code is not in a transaction, so any memory allocations > > it makes aren't protected via the memalloc_nofs_save() context that > > transactions carry. > > > > Hence we need to run this call under memalloc_nofs_save() context to > > prevent potential memory allocations from being run as GFP_KERNEL > > and deadlocking. > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com> > > --- > > Looks fine modulo the header thing Allison already pointed out: > > Reviewed-by: Brian Foster <bfoster@redhat.com> > > BTW, shouldn't we also be using XFS_TRANS_NOFS in > xfs_iomap_write_allocate()? Most likely, yes. However, I think the whole ->writepages path should be moved under memalloc_nofs_save() context. I'm kinda waiting for the bufferhead removal to land before auditing it fully and determining the scope we should be covering by nofs alloc contexts. This was just a drive-by patch to fix a problem that was reported... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-06-08 0:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-07 5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner 2018-06-07 5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner 2018-06-07 5:32 ` Dave Chinner 2018-06-07 11:41 ` Brian Foster 2018-06-07 5:50 ` Allison Henderson 2018-06-07 14:53 ` Darrick J. Wong 2018-06-07 5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner 2018-06-07 5:56 ` Allison Henderson 2018-06-07 14:46 ` Darrick J. Wong 2018-06-07 22:08 ` Dave Chinner 2018-06-07 11:41 ` Brian Foster 2018-06-07 14:48 ` Darrick J. Wong 2018-06-08 0:48 ` Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.