All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fix a couple of potential deadlocks
@ 2018-06-07  5:21 Dave Chinner
  2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
  2018-06-07  5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Chinner @ 2018-06-07  5:21 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

These are a couple of small fixes for lockdep enabled kernels. The
first changes the initialisation of the i_rwsem lockdep state
in the XFS code instead of in unlock_new_inode() to avoid lockdep
re-initialising the lock state after it can be found in the cache
and may have other processes waiting on the lock.

The second is adding the correct memory allocation context to
xfs_reflink_convert_cow() as it gets called in the IO path where we
hold pages locked for IO and so we can't recurse back into memory
reclaim.

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly
  2018-06-07  5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner
@ 2018-06-07  5:21 ` Dave Chinner
  2018-06-07  5:32   ` Dave Chinner
                     ` (2 more replies)
  2018-06-07  5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner
  1 sibling, 3 replies; 13+ messages in thread
From: Dave Chinner @ 2018-06-07  5:21 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When lockdep is enabled, it changes the type of the inode i_rwsem
semaphore before unlocking a newly instantiated inode. THere is the
possibility that there is already a waiter on that inode lock by the
time we unlock the new inode, so having lockdep re-initialise the
lock is a vector for trouble.

Avoid this whole situation by setting up the i_rwsem lockdep class
at the same time we set up the XFS inode i_ilock classes and so the
VFS doesn't have to change the lock class itself when it is
potentially unsafe.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iops.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 29484091c0d2..3020c57fc125 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1258,6 +1258,14 @@ xfs_setup_inode(
 	xfs_diflags_to_iflags(inode, ip);
 
 	if (S_ISDIR(inode->i_mode)) {
+		/*
+		 * We set the i_rwsem class here to avoid potential races with
+		 * lockdep_annotate_inode_mutex_key() reinitialising the lock
+		 * after a filehandle lookup has already found the inode in
+		 * cache before it has been unlocked via unlock_new_inode().
+		 */
+		lockdep_set_class(&inode->i_rwsem,
+				  &inode->i_sb->s_type->i_mutex_dir_key);
 		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class);
 		ip->d_ops = ip->i_mount->m_dir_inode_ops;
 	} else {
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07  5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner
  2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
@ 2018-06-07  5:21 ` Dave Chinner
  2018-06-07  5:56   ` Allison Henderson
  2018-06-07 11:41   ` Brian Foster
  1 sibling, 2 replies; 13+ messages in thread
From: Dave Chinner @ 2018-06-07  5:21 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_reflink_convert_cow() manipulates the incore extent list
in GFP_KERNEL context in the IO submission path whilst holding
locked pages under writeback. This is a memory reclaim deadlock
vector. This code is not in a transaction, so any memory allocations
it makes aren't protected via the memalloc_nofs_save() context that
transactions carry.

Hence we need to run this call under memalloc_nofs_save() context to
prevent potential memory allocations from being run as GFP_KERNEL
and deadlocking.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c  | 11 +++++++++++
 fs/xfs/xfs_buf.c   |  1 -
 fs/xfs/xfs_linux.h |  1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 767d53222f31..1eb625fdcb1e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -531,8 +531,19 @@ xfs_submit_ioend(
 {
 	/* Convert CoW extents to regular */
 	if (!status && ioend->io_type == XFS_IO_COW) {
+		/*
+		 * Yuk. This can do memory allocation, but is not a
+		 * transactional operation so everything is done in GFP_KERNEL
+		 * context. That can deadlock, because we hold pages in
+		 * writeback state and GFP_KERNEL allocations can block on them.
+		 * Hence we must operate in nofs conditions here.
+		 */
+		unsigned nofs_flag;
+
+		nofs_flag = memalloc_nofs_save();
 		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
 				ioend->io_offset, ioend->io_size);
+		memalloc_nofs_restore(nofs_flag);
 	}
 
 	/* Reserve log space if we might write beyond the on-disk inode size. */
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 980bc48979e9..e9c058e3761c 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -21,7 +21,6 @@
 #include <linux/migrate.h>
 #include <linux/backing-dev.h>
 #include <linux/freezer.h>
-#include <linux/sched/mm.h>
 
 #include "xfs_format.h"
 #include "xfs_log_format.h"
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index ae1e66fa3f61..1631cf4546f2 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
 
 #include <linux/semaphore.h>
 #include <linux/mm.h>
+#include <linux/sched/mm.h>
 #include <linux/kernel.h>
 #include <linux/blkdev.h>
 #include <linux/slab.h>
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly
  2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
@ 2018-06-07  5:32   ` Dave Chinner
  2018-06-07 11:41     ` Brian Foster
  2018-06-07  5:50   ` Allison Henderson
  2018-06-07 14:53   ` Darrick J. Wong
  2 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2018-06-07  5:32 UTC (permalink / raw)
  To: linux-xfs

On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When lockdep is enabled, it changes the type of the inode i_rwsem
> semaphore before unlocking a newly instantiated inode. THere is the
> possibility that there is already a waiter on that inode lock by the
> time we unlock the new inode, so having lockdep re-initialise the
> lock is a vector for trouble.
> 
> Avoid this whole situation by setting up the i_rwsem lockdep class
> at the same time we set up the XFS inode i_ilock classes and so the
> VFS doesn't have to change the lock class itself when it is
> potentially unsafe.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

I just realised that the VFS equivalent patch has made it upstream,
too, which would help explain this a bit more. Darrick, can you add
this to the commit message:

"This change is necessary because the equivalent fixes to the VFS
code made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode
combinations safely") are not relevant to XFS as it has it's own
internal inode cache lookup and instantiation routines."

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly
  2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
  2018-06-07  5:32   ` Dave Chinner
@ 2018-06-07  5:50   ` Allison Henderson
  2018-06-07 14:53   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Allison Henderson @ 2018-06-07  5:50 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On 06/06/2018 10:21 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When lockdep is enabled, it changes the type of the inode i_rwsem
> semaphore before unlocking a newly instantiated inode. THere is the
> possibility that there is already a waiter on that inode lock by the
> time we unlock the new inode, so having lockdep re-initialise the
> lock is a vector for trouble.
> 
> Avoid this whole situation by setting up the i_rwsem lockdep class
> at the same time we set up the XFS inode i_ilock classes and so the
> VFS doesn't have to change the lock class itself when it is
> potentially unsafe.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>   fs/xfs/xfs_iops.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 29484091c0d2..3020c57fc125 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1258,6 +1258,14 @@ xfs_setup_inode(
>   	xfs_diflags_to_iflags(inode, ip);
>   
>   	if (S_ISDIR(inode->i_mode)) {
> +		/*
> +		 * We set the i_rwsem class here to avoid potential races with
> +		 * lockdep_annotate_inode_mutex_key() reinitialising the lock
> +		 * after a filehandle lookup has already found the inode in
> +		 * cache before it has been unlocked via unlock_new_inode().
> +		 */
> +		lockdep_set_class(&inode->i_rwsem,
> +				  &inode->i_sb->s_type->i_mutex_dir_key);
>   		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class);
>   		ip->d_ops = ip->i_mount->m_dir_inode_ops;
>   	} else {
> 
Ok, you can add my review:
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07  5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner
@ 2018-06-07  5:56   ` Allison Henderson
  2018-06-07 14:46     ` Darrick J. Wong
  2018-06-07 11:41   ` Brian Foster
  1 sibling, 1 reply; 13+ messages in thread
From: Allison Henderson @ 2018-06-07  5:56 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On 06/06/2018 10:21 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_reflink_convert_cow() manipulates the incore extent list
> in GFP_KERNEL context in the IO submission path whilst holding
> locked pages under writeback. This is a memory reclaim deadlock
> vector. This code is not in a transaction, so any memory allocations
> it makes aren't protected via the memalloc_nofs_save() context that
> transactions carry.
> 
> Hence we need to run this call under memalloc_nofs_save() context to
> prevent potential memory allocations from being run as GFP_KERNEL
> and deadlocking.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>   fs/xfs/xfs_aops.c  | 11 +++++++++++
>   fs/xfs/xfs_buf.c   |  1 -
>   fs/xfs/xfs_linux.h |  1 +
>   3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 767d53222f31..1eb625fdcb1e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -531,8 +531,19 @@ xfs_submit_ioend(
>   {
>   	/* Convert CoW extents to regular */
>   	if (!status && ioend->io_type == XFS_IO_COW) {
> +		/*
> +		 * Yuk. This can do memory allocation, but is not a
> +		 * transactional operation so everything is done in GFP_KERNEL
> +		 * context. That can deadlock, because we hold pages in
> +		 * writeback state and GFP_KERNEL allocations can block on them.
> +		 * Hence we must operate in nofs conditions here.
> +		 */
> +		unsigned nofs_flag;
> +
> +		nofs_flag = memalloc_nofs_save();
>   		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
>   				ioend->io_offset, ioend->io_size);
> +		memalloc_nofs_restore(nofs_flag);
>   	}
>   
>   	/* Reserve log space if we might write beyond the on-disk inode size. */
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 980bc48979e9..e9c058e3761c 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -21,7 +21,6 @@
>   #include <linux/migrate.h>
>   #include <linux/backing-dev.h>
>   #include <linux/freezer.h>
> -#include <linux/sched/mm.h>
>   
>   #include "xfs_format.h"
>   #include "xfs_log_format.h"
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index ae1e66fa3f61..1631cf4546f2 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
>   
>   #include <linux/semaphore.h>
>   #include <linux/mm.h>
> +#include <linux/sched/mm.h>
>   #include <linux/kernel.h>
>   #include <linux/blkdev.h>
>   #include <linux/slab.h>
> 
Looks, ok.  Was moving the header include intentional?  Just clean up 
maybe?  Other than that, looks good.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly
  2018-06-07  5:32   ` Dave Chinner
@ 2018-06-07 11:41     ` Brian Foster
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Foster @ 2018-06-07 11:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 03:32:36PM +1000, Dave Chinner wrote:
> On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When lockdep is enabled, it changes the type of the inode i_rwsem
> > semaphore before unlocking a newly instantiated inode. THere is the
> > possibility that there is already a waiter on that inode lock by the
> > time we unlock the new inode, so having lockdep re-initialise the
> > lock is a vector for trouble.
> > 
> > Avoid this whole situation by setting up the i_rwsem lockdep class
> > at the same time we set up the XFS inode i_ilock classes and so the
> > VFS doesn't have to change the lock class itself when it is
> > potentially unsafe.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> 
> I just realised that the VFS equivalent patch has made it upstream,
> too, which would help explain this a bit more. Darrick, can you add
> this to the commit message:
> 
> "This change is necessary because the equivalent fixes to the VFS
> code made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode
> combinations safely") are not relevant to XFS as it has it's own
> internal inode cache lookup and instantiation routines."
> 

The reference definitely helps, thanks. With that added:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07  5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner
  2018-06-07  5:56   ` Allison Henderson
@ 2018-06-07 11:41   ` Brian Foster
  2018-06-07 14:48     ` Darrick J. Wong
  2018-06-08  0:48     ` Dave Chinner
  1 sibling, 2 replies; 13+ messages in thread
From: Brian Foster @ 2018-06-07 11:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_reflink_convert_cow() manipulates the incore extent list
> in GFP_KERNEL context in the IO submission path whilst holding
> locked pages under writeback. This is a memory reclaim deadlock
> vector. This code is not in a transaction, so any memory allocations
> it makes aren't protected via the memalloc_nofs_save() context that
> transactions carry.
> 
> Hence we need to run this call under memalloc_nofs_save() context to
> prevent potential memory allocations from being run as GFP_KERNEL
> and deadlocking.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Looks fine modulo the header thing Allison already pointed out:

Reviewed-by: Brian Foster <bfoster@redhat.com>

BTW, shouldn't we also be using XFS_TRANS_NOFS in
xfs_iomap_write_allocate()?

Brian

>  fs/xfs/xfs_aops.c  | 11 +++++++++++
>  fs/xfs/xfs_buf.c   |  1 -
>  fs/xfs/xfs_linux.h |  1 +
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 767d53222f31..1eb625fdcb1e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -531,8 +531,19 @@ xfs_submit_ioend(
>  {
>  	/* Convert CoW extents to regular */
>  	if (!status && ioend->io_type == XFS_IO_COW) {
> +		/*
> +		 * Yuk. This can do memory allocation, but is not a
> +		 * transactional operation so everything is done in GFP_KERNEL
> +		 * context. That can deadlock, because we hold pages in
> +		 * writeback state and GFP_KERNEL allocations can block on them.
> +		 * Hence we must operate in nofs conditions here.
> +		 */
> +		unsigned nofs_flag;
> +
> +		nofs_flag = memalloc_nofs_save();
>  		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
>  				ioend->io_offset, ioend->io_size);
> +		memalloc_nofs_restore(nofs_flag);
>  	}
>  
>  	/* Reserve log space if we might write beyond the on-disk inode size. */
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 980bc48979e9..e9c058e3761c 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -21,7 +21,6 @@
>  #include <linux/migrate.h>
>  #include <linux/backing-dev.h>
>  #include <linux/freezer.h>
> -#include <linux/sched/mm.h>
>  
>  #include "xfs_format.h"
>  #include "xfs_log_format.h"
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index ae1e66fa3f61..1631cf4546f2 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
>  
>  #include <linux/semaphore.h>
>  #include <linux/mm.h>
> +#include <linux/sched/mm.h>
>  #include <linux/kernel.h>
>  #include <linux/blkdev.h>
>  #include <linux/slab.h>
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07  5:56   ` Allison Henderson
@ 2018-06-07 14:46     ` Darrick J. Wong
  2018-06-07 22:08       ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2018-06-07 14:46 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Dave Chinner, linux-xfs

On Wed, Jun 06, 2018 at 10:56:50PM -0700, Allison Henderson wrote:
> On 06/06/2018 10:21 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_reflink_convert_cow() manipulates the incore extent list
> > in GFP_KERNEL context in the IO submission path whilst holding
> > locked pages under writeback. This is a memory reclaim deadlock
> > vector. This code is not in a transaction, so any memory allocations
> > it makes aren't protected via the memalloc_nofs_save() context that
> > transactions carry.
> > 
> > Hence we need to run this call under memalloc_nofs_save() context to
> > prevent potential memory allocations from being run as GFP_KERNEL
> > and deadlocking.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >   fs/xfs/xfs_aops.c  | 11 +++++++++++
> >   fs/xfs/xfs_buf.c   |  1 -
> >   fs/xfs/xfs_linux.h |  1 +
> >   3 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 767d53222f31..1eb625fdcb1e 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -531,8 +531,19 @@ xfs_submit_ioend(
> >   {
> >   	/* Convert CoW extents to regular */
> >   	if (!status && ioend->io_type == XFS_IO_COW) {
> > +		/*
> > +		 * Yuk. This can do memory allocation, but is not a
> > +		 * transactional operation so everything is done in GFP_KERNEL
> > +		 * context. That can deadlock, because we hold pages in
> > +		 * writeback state and GFP_KERNEL allocations can block on them.
> > +		 * Hence we must operate in nofs conditions here.
> > +		 */
> > +		unsigned nofs_flag;
> > +
> > +		nofs_flag = memalloc_nofs_save();
> >   		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
> >   				ioend->io_offset, ioend->io_size);
> > +		memalloc_nofs_restore(nofs_flag);

DOH. :)

> >   	}
> >   	/* Reserve log space if we might write beyond the on-disk inode size. */
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 980bc48979e9..e9c058e3761c 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -21,7 +21,6 @@
> >   #include <linux/migrate.h>
> >   #include <linux/backing-dev.h>
> >   #include <linux/freezer.h>
> > -#include <linux/sched/mm.h>
> >   #include "xfs_format.h"
> >   #include "xfs_log_format.h"
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index ae1e66fa3f61..1631cf4546f2 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
> >   #include <linux/semaphore.h>
> >   #include <linux/mm.h>
> > +#include <linux/sched/mm.h>
> >   #include <linux/kernel.h>
> >   #include <linux/blkdev.h>
> >   #include <linux/slab.h>
> > 
> Looks, ok.  Was moving the header include intentional?  Just clean up maybe?
> Other than that, looks good.

I can't speak for Dave, but I'll point out that memalloc_nofs_restore is
declared in linux/sched/mm.h, so the #include hoist makes the symbol
available to the aops code in such a manner that now it's available to
all the xfs code so that we don't have to remember this...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07 11:41   ` Brian Foster
@ 2018-06-07 14:48     ` Darrick J. Wong
  2018-06-08  0:48     ` Dave Chinner
  1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2018-06-07 14:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Thu, Jun 07, 2018 at 07:41:40AM -0400, Brian Foster wrote:
> On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_reflink_convert_cow() manipulates the incore extent list
> > in GFP_KERNEL context in the IO submission path whilst holding
> > locked pages under writeback. This is a memory reclaim deadlock
> > vector. This code is not in a transaction, so any memory allocations
> > it makes aren't protected via the memalloc_nofs_save() context that
> > transactions carry.
> > 
> > Hence we need to run this call under memalloc_nofs_save() context to
> > prevent potential memory allocations from being run as GFP_KERNEL
> > and deadlocking.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Looks fine modulo the header thing Allison already pointed out:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> BTW, shouldn't we also be using XFS_TRANS_NOFS in
> xfs_iomap_write_allocate()?

/me squints at the usage and thinks ... yes probably. :)

--D

> Brian
> 
> >  fs/xfs/xfs_aops.c  | 11 +++++++++++
> >  fs/xfs/xfs_buf.c   |  1 -
> >  fs/xfs/xfs_linux.h |  1 +
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 767d53222f31..1eb625fdcb1e 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -531,8 +531,19 @@ xfs_submit_ioend(
> >  {
> >  	/* Convert CoW extents to regular */
> >  	if (!status && ioend->io_type == XFS_IO_COW) {
> > +		/*
> > +		 * Yuk. This can do memory allocation, but is not a
> > +		 * transactional operation so everything is done in GFP_KERNEL
> > +		 * context. That can deadlock, because we hold pages in
> > +		 * writeback state and GFP_KERNEL allocations can block on them.
> > +		 * Hence we must operate in nofs conditions here.
> > +		 */
> > +		unsigned nofs_flag;
> > +
> > +		nofs_flag = memalloc_nofs_save();
> >  		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
> >  				ioend->io_offset, ioend->io_size);
> > +		memalloc_nofs_restore(nofs_flag);
> >  	}
> >  
> >  	/* Reserve log space if we might write beyond the on-disk inode size. */
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 980bc48979e9..e9c058e3761c 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -21,7 +21,6 @@
> >  #include <linux/migrate.h>
> >  #include <linux/backing-dev.h>
> >  #include <linux/freezer.h>
> > -#include <linux/sched/mm.h>
> >  
> >  #include "xfs_format.h"
> >  #include "xfs_log_format.h"
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index ae1e66fa3f61..1631cf4546f2 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
> >  
> >  #include <linux/semaphore.h>
> >  #include <linux/mm.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/kernel.h>
> >  #include <linux/blkdev.h>
> >  #include <linux/slab.h>
> > -- 
> > 2.17.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly
  2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
  2018-06-07  5:32   ` Dave Chinner
  2018-06-07  5:50   ` Allison Henderson
@ 2018-06-07 14:53   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2018-06-07 14:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 03:21:31PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When lockdep is enabled, it changes the type of the inode i_rwsem
> semaphore before unlocking a newly instantiated inode. THere is the
> possibility that there is already a waiter on that inode lock by the
> time we unlock the new inode, so having lockdep re-initialise the
> lock is a vector for trouble.
> 
> Avoid this whole situation by setting up the i_rwsem lockdep class
> at the same time we set up the XFS inode i_ilock classes and so the
> VFS doesn't have to change the lock class itself when it is
> potentially unsafe.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

With the commit message changes added,

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iops.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 29484091c0d2..3020c57fc125 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1258,6 +1258,14 @@ xfs_setup_inode(
>  	xfs_diflags_to_iflags(inode, ip);
>  
>  	if (S_ISDIR(inode->i_mode)) {
> +		/*
> +		 * We set the i_rwsem class here to avoid potential races with
> +		 * lockdep_annotate_inode_mutex_key() reinitialising the lock
> +		 * after a filehandle lookup has already found the inode in
> +		 * cache before it has been unlocked via unlock_new_inode().
> +		 */
> +		lockdep_set_class(&inode->i_rwsem,
> +				  &inode->i_sb->s_type->i_mutex_dir_key);
>  		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class);
>  		ip->d_ops = ip->i_mount->m_dir_inode_ops;
>  	} else {
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07 14:46     ` Darrick J. Wong
@ 2018-06-07 22:08       ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2018-06-07 22:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, linux-xfs

On Thu, Jun 07, 2018 at 07:46:31AM -0700, Darrick J. Wong wrote:
> On Wed, Jun 06, 2018 at 10:56:50PM -0700, Allison Henderson wrote:
> > On 06/06/2018 10:21 PM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > xfs_reflink_convert_cow() manipulates the incore extent list
> > > in GFP_KERNEL context in the IO submission path whilst holding
> > > locked pages under writeback. This is a memory reclaim deadlock
> > > vector. This code is not in a transaction, so any memory allocations
> > > it makes aren't protected via the memalloc_nofs_save() context that
> > > transactions carry.
> > > 
> > > Hence we need to run this call under memalloc_nofs_save() context to
> > > prevent potential memory allocations from being run as GFP_KERNEL
> > > and deadlocking.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >   fs/xfs/xfs_aops.c  | 11 +++++++++++
> > >   fs/xfs/xfs_buf.c   |  1 -
> > >   fs/xfs/xfs_linux.h |  1 +
> > >   3 files changed, 12 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index 767d53222f31..1eb625fdcb1e 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -531,8 +531,19 @@ xfs_submit_ioend(
> > >   {
> > >   	/* Convert CoW extents to regular */
> > >   	if (!status && ioend->io_type == XFS_IO_COW) {
> > > +		/*
> > > +		 * Yuk. This can do memory allocation, but is not a
> > > +		 * transactional operation so everything is done in GFP_KERNEL
> > > +		 * context. That can deadlock, because we hold pages in
> > > +		 * writeback state and GFP_KERNEL allocations can block on them.
> > > +		 * Hence we must operate in nofs conditions here.
> > > +		 */
> > > +		unsigned nofs_flag;
> > > +
> > > +		nofs_flag = memalloc_nofs_save();
> > >   		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
> > >   				ioend->io_offset, ioend->io_size);
> > > +		memalloc_nofs_restore(nofs_flag);
> 
> DOH. :)
> 
> > >   	}
> > >   	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 980bc48979e9..e9c058e3761c 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -21,7 +21,6 @@
> > >   #include <linux/migrate.h>
> > >   #include <linux/backing-dev.h>
> > >   #include <linux/freezer.h>
> > > -#include <linux/sched/mm.h>
> > >   #include "xfs_format.h"
> > >   #include "xfs_log_format.h"
> > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > > index ae1e66fa3f61..1631cf4546f2 100644
> > > --- a/fs/xfs/xfs_linux.h
> > > +++ b/fs/xfs/xfs_linux.h
> > > @@ -26,6 +26,7 @@ typedef __u32			xfs_nlink_t;
> > >   #include <linux/semaphore.h>
> > >   #include <linux/mm.h>
> > > +#include <linux/sched/mm.h>
> > >   #include <linux/kernel.h>
> > >   #include <linux/blkdev.h>
> > >   #include <linux/slab.h>
> > > 
> > Looks, ok.  Was moving the header include intentional?  Just clean up maybe?
> > Other than that, looks good.
> 
> I can't speak for Dave, but I'll point out that memalloc_nofs_restore is
> declared in linux/sched/mm.h, so the #include hoist makes the symbol
> available to the aops code in such a manner that now it's available to
> all the xfs code so that we don't have to remember this...

*nod*

That's historically how we've handled OS level includes - it keeps
the XFS files to including XFS headers only and that makes things
like userspace code syncs a lot easier.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock
  2018-06-07 11:41   ` Brian Foster
  2018-06-07 14:48     ` Darrick J. Wong
@ 2018-06-08  0:48     ` Dave Chinner
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2018-06-08  0:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Jun 07, 2018 at 07:41:40AM -0400, Brian Foster wrote:
> On Thu, Jun 07, 2018 at 03:21:32PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_reflink_convert_cow() manipulates the incore extent list
> > in GFP_KERNEL context in the IO submission path whilst holding
> > locked pages under writeback. This is a memory reclaim deadlock
> > vector. This code is not in a transaction, so any memory allocations
> > it makes aren't protected via the memalloc_nofs_save() context that
> > transactions carry.
> > 
> > Hence we need to run this call under memalloc_nofs_save() context to
> > prevent potential memory allocations from being run as GFP_KERNEL
> > and deadlocking.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Looks fine modulo the header thing Allison already pointed out:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> BTW, shouldn't we also be using XFS_TRANS_NOFS in
> xfs_iomap_write_allocate()?

Most likely, yes.

However, I think the whole ->writepages path should be moved under
memalloc_nofs_save() context. I'm kinda waiting for the bufferhead
removal to land before auditing it fully and determining the scope
we should be covering by nofs alloc contexts. This was just a
drive-by patch to fix a problem that was reported...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-06-08  0:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07  5:21 [PATCH 0/2] xfs: fix a couple of potential deadlocks Dave Chinner
2018-06-07  5:21 ` [PATCH 1/2] xfs: setup VFS i_rwsem lockdep state correctly Dave Chinner
2018-06-07  5:32   ` Dave Chinner
2018-06-07 11:41     ` Brian Foster
2018-06-07  5:50   ` Allison Henderson
2018-06-07 14:53   ` Darrick J. Wong
2018-06-07  5:21 ` [PATCH 2/2] xfs: xfs_reflink_convert_cow() memory allocation deadlock Dave Chinner
2018-06-07  5:56   ` Allison Henderson
2018-06-07 14:46     ` Darrick J. Wong
2018-06-07 22:08       ` Dave Chinner
2018-06-07 11:41   ` Brian Foster
2018-06-07 14:48     ` Darrick J. Wong
2018-06-08  0:48     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.