All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: don't nest transactions when scanning for eofblocks
@ 2021-02-19  4:29 Darrick J. Wong
  2021-02-19 13:09 ` Brian Foster
  2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-02-19  4:29 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner, Brian Foster, Christoph Hellwig

From: Darrick J. Wong <djwong@kernel.org>

Brian Foster reported a lockdep warning on xfs/167:

============================================
WARNING: possible recursive locking detected
5.11.0-rc4 #35 Tainted: G        W I
--------------------------------------------
fsstress/17733 is trying to acquire lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]

but task is already holding lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]

stack backtrace:
CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x159/0x2ab
 lock_acquire+0x116/0x370
 xfs_trans_alloc+0x1ad/0x310 [xfs]
 xfs_free_eofblocks+0x104/0x1d0 [xfs]
 xfs_blockgc_scan_inode+0x24/0x60 [xfs]
 xfs_inode_walk_ag+0x202/0x4b0 [xfs]
 xfs_inode_walk+0x66/0xc0 [xfs]
 xfs_trans_alloc+0x160/0x310 [xfs]
 xfs_trans_alloc_inode+0x5f/0x160 [xfs]
 xfs_alloc_file_space+0x105/0x300 [xfs]
 xfs_file_fallocate+0x270/0x460 [xfs]
 vfs_fallocate+0x14d/0x3d0
 __x64_sys_fallocate+0x3e/0x70
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The cause of this is the new code that spurs a scan to garbage collect
speculative preallocations if we fail to reserve enough blocks while
allocating a transaction.  While the warning itself is a fairly benign
lockdep complaint, it does bring to light a potential livelock.

Specifically, when we kick off that scan, we're still holding onto the
transaction's log reservation.  If the blockgc scan finds something to
free, it will need its own transaction, which means that it can block on
the log grant.  This means that if there are enough writer threads to
take all the log reservation space with that first transaction, the
second reservation attempts will all block on log space that cannot be
freed, leading to a livelock.

Fix this by freeing the transaction and jumping back to xfs_trans_alloc
like this patch in the V4 submission[1].

[1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/

Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 44f72c09c203..377f3961d7ed 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -260,6 +260,7 @@ xfs_trans_alloc(
 	struct xfs_trans	**tpp)
 {
 	struct xfs_trans	*tp;
+	bool			want_retry = true;
 	int			error;
 
 	/*
@@ -267,6 +268,7 @@ xfs_trans_alloc(
 	 * GFP_NOFS allocation context so that we avoid lockdep false positives
 	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
 	 */
+retry:
 	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
 	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_start_intwrite(mp->m_super);
@@ -289,7 +291,9 @@ xfs_trans_alloc(
 	tp->t_firstblock = NULLFSBLOCK;
 
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
-	if (error == -ENOSPC) {
+	if (error == -ENOSPC && want_retry) {
+		xfs_trans_cancel(tp);
+
 		/*
 		 * We weren't able to reserve enough space for the transaction.
 		 * Flush the other speculative space allocations to free space.
@@ -297,8 +301,11 @@ xfs_trans_alloc(
 		 * other locks.
 		 */
 		error = xfs_blockgc_free_space(mp, NULL);
-		if (!error)
-			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
+		if (error)
+			return error;
+
+		want_retry = false;
+		goto retry;
 	}
 	if (error) {
 		xfs_trans_cancel(tp);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19  4:29 [PATCH] xfs: don't nest transactions when scanning for eofblocks Darrick J. Wong
@ 2021-02-19 13:09 ` Brian Foster
  2021-02-19 17:23   ` Darrick J. Wong
  2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Foster @ 2021-02-19 13:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Dave Chinner, Christoph Hellwig

On Thu, Feb 18, 2021 at 08:29:40PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Brian Foster reported a lockdep warning on xfs/167:
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.11.0-rc4 #35 Tainted: G        W I
> --------------------------------------------
> fsstress/17733 is trying to acquire lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
> 
> but task is already holding lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> 
> stack backtrace:
> CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
> Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> Call Trace:
>  dump_stack+0x8b/0xb0
>  __lock_acquire.cold+0x159/0x2ab
>  lock_acquire+0x116/0x370
>  xfs_trans_alloc+0x1ad/0x310 [xfs]
>  xfs_free_eofblocks+0x104/0x1d0 [xfs]
>  xfs_blockgc_scan_inode+0x24/0x60 [xfs]
>  xfs_inode_walk_ag+0x202/0x4b0 [xfs]
>  xfs_inode_walk+0x66/0xc0 [xfs]
>  xfs_trans_alloc+0x160/0x310 [xfs]
>  xfs_trans_alloc_inode+0x5f/0x160 [xfs]
>  xfs_alloc_file_space+0x105/0x300 [xfs]
>  xfs_file_fallocate+0x270/0x460 [xfs]
>  vfs_fallocate+0x14d/0x3d0
>  __x64_sys_fallocate+0x3e/0x70
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The cause of this is the new code that spurs a scan to garbage collect
> speculative preallocations if we fail to reserve enough blocks while
> allocating a transaction.  While the warning itself is a fairly benign
> lockdep complaint, it does bring to light a potential livelock.
> 
> Specifically, when we kick off that scan, we're still holding onto the
> transaction's log reservation.  If the blockgc scan finds something to
> free, it will need its own transaction, which means that it can block on
> the log grant.  This means that if there are enough writer threads to
> take all the log reservation space with that first transaction, the
> second reservation attempts will all block on log space that cannot be
> freed, leading to a livelock.
> 

The text above around a prospective livelock doesn't seem accurate.
Otherwise the code looks fine to me. I don't have a preference between
this patch or the other one...

Brian

> Fix this by freeing the transaction and jumping back to xfs_trans_alloc
> like this patch in the V4 submission[1].
> 
> [1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
> 
> Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_trans.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 44f72c09c203..377f3961d7ed 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -260,6 +260,7 @@ xfs_trans_alloc(
>  	struct xfs_trans	**tpp)
>  {
>  	struct xfs_trans	*tp;
> +	bool			want_retry = true;
>  	int			error;
>  
>  	/*
> @@ -267,6 +268,7 @@ xfs_trans_alloc(
>  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
>  	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
>  	 */
> +retry:
>  	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
>  	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
>  		sb_start_intwrite(mp->m_super);
> @@ -289,7 +291,9 @@ xfs_trans_alloc(
>  	tp->t_firstblock = NULLFSBLOCK;
>  
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> -	if (error == -ENOSPC) {
> +	if (error == -ENOSPC && want_retry) {
> +		xfs_trans_cancel(tp);
> +
>  		/*
>  		 * We weren't able to reserve enough space for the transaction.
>  		 * Flush the other speculative space allocations to free space.
> @@ -297,8 +301,11 @@ xfs_trans_alloc(
>  		 * other locks.
>  		 */
>  		error = xfs_blockgc_free_space(mp, NULL);
> -		if (!error)
> -			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> +		if (error)
> +			return error;
> +
> +		want_retry = false;
> +		goto retry;
>  	}
>  	if (error) {
>  		xfs_trans_cancel(tp);
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19 13:09 ` Brian Foster
@ 2021-02-19 17:23   ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-02-19 17:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Dave Chinner, Christoph Hellwig

On Fri, Feb 19, 2021 at 08:09:53AM -0500, Brian Foster wrote:
> On Thu, Feb 18, 2021 at 08:29:40PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Brian Foster reported a lockdep warning on xfs/167:
> > 
> > ============================================
> > WARNING: possible recursive locking detected
> > 5.11.0-rc4 #35 Tainted: G        W I
> > --------------------------------------------
> > fsstress/17733 is trying to acquire lock:
> > ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
> > 
> > but task is already holding lock:
> > ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> > 
> > stack backtrace:
> > CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
> > Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> > Call Trace:
> >  dump_stack+0x8b/0xb0
> >  __lock_acquire.cold+0x159/0x2ab
> >  lock_acquire+0x116/0x370
> >  xfs_trans_alloc+0x1ad/0x310 [xfs]
> >  xfs_free_eofblocks+0x104/0x1d0 [xfs]
> >  xfs_blockgc_scan_inode+0x24/0x60 [xfs]
> >  xfs_inode_walk_ag+0x202/0x4b0 [xfs]
> >  xfs_inode_walk+0x66/0xc0 [xfs]
> >  xfs_trans_alloc+0x160/0x310 [xfs]
> >  xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> >  xfs_alloc_file_space+0x105/0x300 [xfs]
> >  xfs_file_fallocate+0x270/0x460 [xfs]
> >  vfs_fallocate+0x14d/0x3d0
> >  __x64_sys_fallocate+0x3e/0x70
> >  do_syscall_64+0x33/0x40
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > The cause of this is the new code that spurs a scan to garbage collect
> > speculative preallocations if we fail to reserve enough blocks while
> > allocating a transaction.  While the warning itself is a fairly benign
> > lockdep complaint, it does bring to light a potential livelock.
> > 
> > Specifically, when we kick off that scan, we're still holding onto the
> > transaction's log reservation.  If the blockgc scan finds something to
> > free, it will need its own transaction, which means that it can block on
> > the log grant.  This means that if there are enough writer threads to
> > take all the log reservation space with that first transaction, the
> > second reservation attempts will all block on log space that cannot be
> > freed, leading to a livelock.
> > 
> 
> The text above around a prospective livelock doesn't seem accurate.
> Otherwise the code looks fine to me. I don't have a preference between
> this patch or the other one...

Doh, that was leftover from when I thought there was a real log
reservation deadlock opportunity.  Will post v2.

--D

> Brian
> 
> > Fix this by freeing the transaction and jumping back to xfs_trans_alloc
> > like this patch in the V4 submission[1].
> > 
> > [1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
> > 
> > Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
> > Reported-by: Brian Foster <bfoster@redhat.com>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_trans.c |   13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 44f72c09c203..377f3961d7ed 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -260,6 +260,7 @@ xfs_trans_alloc(
> >  	struct xfs_trans	**tpp)
> >  {
> >  	struct xfs_trans	*tp;
> > +	bool			want_retry = true;
> >  	int			error;
> >  
> >  	/*
> > @@ -267,6 +268,7 @@ xfs_trans_alloc(
> >  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
> >  	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
> >  	 */
> > +retry:
> >  	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
> >  	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
> >  		sb_start_intwrite(mp->m_super);
> > @@ -289,7 +291,9 @@ xfs_trans_alloc(
> >  	tp->t_firstblock = NULLFSBLOCK;
> >  
> >  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > -	if (error == -ENOSPC) {
> > +	if (error == -ENOSPC && want_retry) {
> > +		xfs_trans_cancel(tp);
> > +
> >  		/*
> >  		 * We weren't able to reserve enough space for the transaction.
> >  		 * Flush the other speculative space allocations to free space.
> > @@ -297,8 +301,11 @@ xfs_trans_alloc(
> >  		 * other locks.
> >  		 */
> >  		error = xfs_blockgc_free_space(mp, NULL);
> > -		if (!error)
> > -			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > +		if (error)
> > +			return error;
> > +
> > +		want_retry = false;
> > +		goto retry;
> >  	}
> >  	if (error) {
> >  		xfs_trans_cancel(tp);
> > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19  4:29 [PATCH] xfs: don't nest transactions when scanning for eofblocks Darrick J. Wong
  2021-02-19 13:09 ` Brian Foster
@ 2021-02-19 17:23 ` Darrick J. Wong
  2021-02-19 18:12   ` Brian Foster
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-02-19 17:23 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner, Brian Foster, Christoph Hellwig

From: Darrick J. Wong <djwong@kernel.org>

Brian Foster reported a lockdep warning on xfs/167:

============================================
WARNING: possible recursive locking detected
5.11.0-rc4 #35 Tainted: G        W I
--------------------------------------------
fsstress/17733 is trying to acquire lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]

but task is already holding lock:
ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]

stack backtrace:
CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x159/0x2ab
 lock_acquire+0x116/0x370
 xfs_trans_alloc+0x1ad/0x310 [xfs]
 xfs_free_eofblocks+0x104/0x1d0 [xfs]
 xfs_blockgc_scan_inode+0x24/0x60 [xfs]
 xfs_inode_walk_ag+0x202/0x4b0 [xfs]
 xfs_inode_walk+0x66/0xc0 [xfs]
 xfs_trans_alloc+0x160/0x310 [xfs]
 xfs_trans_alloc_inode+0x5f/0x160 [xfs]
 xfs_alloc_file_space+0x105/0x300 [xfs]
 xfs_file_fallocate+0x270/0x460 [xfs]
 vfs_fallocate+0x14d/0x3d0
 __x64_sys_fallocate+0x3e/0x70
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The cause of this is the new code that spurs a scan to garbage collect
speculative preallocations if we fail to reserve enough blocks while
allocating a transaction.  While the warning itself is a fairly benign
lockdep complaint, it does expose a potential livelock if the rwsem
behavior ever changes with regards to nesting read locks when someone's
waiting for a write lock.

Fix this by freeing the transaction and jumping back to xfs_trans_alloc
like this patch in the V4 submission[1].

[1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/

Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2: fix commit message
---
 fs/xfs/xfs_trans.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 44f72c09c203..377f3961d7ed 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -260,6 +260,7 @@ xfs_trans_alloc(
 	struct xfs_trans	**tpp)
 {
 	struct xfs_trans	*tp;
+	bool			want_retry = true;
 	int			error;
 
 	/*
@@ -267,6 +268,7 @@ xfs_trans_alloc(
 	 * GFP_NOFS allocation context so that we avoid lockdep false positives
 	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
 	 */
+retry:
 	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
 	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_start_intwrite(mp->m_super);
@@ -289,7 +291,9 @@ xfs_trans_alloc(
 	tp->t_firstblock = NULLFSBLOCK;
 
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
-	if (error == -ENOSPC) {
+	if (error == -ENOSPC && want_retry) {
+		xfs_trans_cancel(tp);
+
 		/*
 		 * We weren't able to reserve enough space for the transaction.
 		 * Flush the other speculative space allocations to free space.
@@ -297,8 +301,11 @@ xfs_trans_alloc(
 		 * other locks.
 		 */
 		error = xfs_blockgc_free_space(mp, NULL);
-		if (!error)
-			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
+		if (error)
+			return error;
+
+		want_retry = false;
+		goto retry;
 	}
 	if (error) {
 		xfs_trans_cancel(tp);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
@ 2021-02-19 18:12   ` Brian Foster
  2021-02-20  3:44   ` Allison Henderson
  2021-02-25  7:45   ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Brian Foster @ 2021-02-19 18:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Dave Chinner, Christoph Hellwig

On Fri, Feb 19, 2021 at 09:23:41AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Brian Foster reported a lockdep warning on xfs/167:
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.11.0-rc4 #35 Tainted: G        W I
> --------------------------------------------
> fsstress/17733 is trying to acquire lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
> 
> but task is already holding lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> 
> stack backtrace:
> CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
> Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> Call Trace:
>  dump_stack+0x8b/0xb0
>  __lock_acquire.cold+0x159/0x2ab
>  lock_acquire+0x116/0x370
>  xfs_trans_alloc+0x1ad/0x310 [xfs]
>  xfs_free_eofblocks+0x104/0x1d0 [xfs]
>  xfs_blockgc_scan_inode+0x24/0x60 [xfs]
>  xfs_inode_walk_ag+0x202/0x4b0 [xfs]
>  xfs_inode_walk+0x66/0xc0 [xfs]
>  xfs_trans_alloc+0x160/0x310 [xfs]
>  xfs_trans_alloc_inode+0x5f/0x160 [xfs]
>  xfs_alloc_file_space+0x105/0x300 [xfs]
>  xfs_file_fallocate+0x270/0x460 [xfs]
>  vfs_fallocate+0x14d/0x3d0
>  __x64_sys_fallocate+0x3e/0x70
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The cause of this is the new code that spurs a scan to garbage collect
> speculative preallocations if we fail to reserve enough blocks while
> allocating a transaction.  While the warning itself is a fairly benign
> lockdep complaint, it does expose a potential livelock if the rwsem
> behavior ever changes with regards to nesting read locks when someone's
> waiting for a write lock.
> 
> Fix this by freeing the transaction and jumping back to xfs_trans_alloc
> like this patch in the V4 submission[1].
> 
> [1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
> 
> Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2: fix commit message
> ---

Looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_trans.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 44f72c09c203..377f3961d7ed 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -260,6 +260,7 @@ xfs_trans_alloc(
>  	struct xfs_trans	**tpp)
>  {
>  	struct xfs_trans	*tp;
> +	bool			want_retry = true;
>  	int			error;
>  
>  	/*
> @@ -267,6 +268,7 @@ xfs_trans_alloc(
>  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
>  	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
>  	 */
> +retry:
>  	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
>  	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
>  		sb_start_intwrite(mp->m_super);
> @@ -289,7 +291,9 @@ xfs_trans_alloc(
>  	tp->t_firstblock = NULLFSBLOCK;
>  
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> -	if (error == -ENOSPC) {
> +	if (error == -ENOSPC && want_retry) {
> +		xfs_trans_cancel(tp);
> +
>  		/*
>  		 * We weren't able to reserve enough space for the transaction.
>  		 * Flush the other speculative space allocations to free space.
> @@ -297,8 +301,11 @@ xfs_trans_alloc(
>  		 * other locks.
>  		 */
>  		error = xfs_blockgc_free_space(mp, NULL);
> -		if (!error)
> -			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> +		if (error)
> +			return error;
> +
> +		want_retry = false;
> +		goto retry;
>  	}
>  	if (error) {
>  		xfs_trans_cancel(tp);
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
  2021-02-19 18:12   ` Brian Foster
@ 2021-02-20  3:44   ` Allison Henderson
  2021-02-25  7:45   ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Allison Henderson @ 2021-02-20  3:44 UTC (permalink / raw)
  To: Darrick J. Wong, xfs; +Cc: Dave Chinner, Brian Foster, Christoph Hellwig



On 2/19/21 10:23 AM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Brian Foster reported a lockdep warning on xfs/167:
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.11.0-rc4 #35 Tainted: G        W I
> --------------------------------------------
> fsstress/17733 is trying to acquire lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
> 
> but task is already holding lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> 
> stack backtrace:
> CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
> Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> Call Trace:
>   dump_stack+0x8b/0xb0
>   __lock_acquire.cold+0x159/0x2ab
>   lock_acquire+0x116/0x370
>   xfs_trans_alloc+0x1ad/0x310 [xfs]
>   xfs_free_eofblocks+0x104/0x1d0 [xfs]
>   xfs_blockgc_scan_inode+0x24/0x60 [xfs]
>   xfs_inode_walk_ag+0x202/0x4b0 [xfs]
>   xfs_inode_walk+0x66/0xc0 [xfs]
>   xfs_trans_alloc+0x160/0x310 [xfs]
>   xfs_trans_alloc_inode+0x5f/0x160 [xfs]
>   xfs_alloc_file_space+0x105/0x300 [xfs]
>   xfs_file_fallocate+0x270/0x460 [xfs]
>   vfs_fallocate+0x14d/0x3d0
>   __x64_sys_fallocate+0x3e/0x70
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The cause of this is the new code that spurs a scan to garbage collect
> speculative preallocations if we fail to reserve enough blocks while
> allocating a transaction.  While the warning itself is a fairly benign
> lockdep complaint, it does expose a potential livelock if the rwsem
> behavior ever changes with regards to nesting read locks when someone's
> waiting for a write lock.
> 
> Fix this by freeing the transaction and jumping back to xfs_trans_alloc
> like this patch in the V4 submission[1].
> 
> [1] https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/__;!!GqivPVa7Brio!ONLN-B-M-uqN8aJAN8zMDHDQZ6wwDyF4BSpjkT9j3mV2Zxe5zVD0vgjTWvPFRO2tzEpN$
> 
> Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
> v2: fix commit message
> ---
>   fs/xfs/xfs_trans.c |   13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 44f72c09c203..377f3961d7ed 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -260,6 +260,7 @@ xfs_trans_alloc(
>   	struct xfs_trans	**tpp)
>   {
>   	struct xfs_trans	*tp;
> +	bool			want_retry = true;
>   	int			error;
>   
>   	/*
> @@ -267,6 +268,7 @@ xfs_trans_alloc(
>   	 * GFP_NOFS allocation context so that we avoid lockdep false positives
>   	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
>   	 */
> +retry:
>   	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
>   	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
>   		sb_start_intwrite(mp->m_super);
> @@ -289,7 +291,9 @@ xfs_trans_alloc(
>   	tp->t_firstblock = NULLFSBLOCK;
>   
>   	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> -	if (error == -ENOSPC) {
> +	if (error == -ENOSPC && want_retry) {
> +		xfs_trans_cancel(tp);
> +
>   		/*
>   		 * We weren't able to reserve enough space for the transaction.
>   		 * Flush the other speculative space allocations to free space.
> @@ -297,8 +301,11 @@ xfs_trans_alloc(
>   		 * other locks.
>   		 */
>   		error = xfs_blockgc_free_space(mp, NULL);
> -		if (!error)
> -			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> +		if (error)
> +			return error;
> +
> +		want_retry = false;
> +		goto retry;
>   	}
>   	if (error) {
>   		xfs_trans_cancel(tp);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] xfs: don't nest transactions when scanning for eofblocks
  2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
  2021-02-19 18:12   ` Brian Foster
  2021-02-20  3:44   ` Allison Henderson
@ 2021-02-25  7:45   ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-02-25  7:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Dave Chinner, Brian Foster, Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-25  7:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-19  4:29 [PATCH] xfs: don't nest transactions when scanning for eofblocks Darrick J. Wong
2021-02-19 13:09 ` Brian Foster
2021-02-19 17:23   ` Darrick J. Wong
2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
2021-02-19 18:12   ` Brian Foster
2021-02-20  3:44   ` Allison Henderson
2021-02-25  7:45   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.