linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: fix deadlock when reserving space during defrag
@ 2022-01-20 14:27 fdmanana
  2022-01-20 16:43 ` David Sterba
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: fdmanana @ 2022-01-20 14:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When defragging we can end up collecting a range for defrag that has
already pages under delalloc (dirty), as long as the respective extent
map for their range is not mapped to a hole, a prealloc extent or
the extent map is from an old generation.

Most of the time that is harmless from a functional perspective at
least, however it can result in a deadlock:

1) At defrag_collect_targets() we find an extent map that meets all
   requirements but there's delalloc for the range it covers, and we add
   its range to list of ranges to defrag;

2) The defrag_collect_targets() function is called at defrag_one_range(),
   after it locked a range that overlaps the range of the extent map;

3) At defrag_one_range(), while the range is still locked, we call
   defrag_one_locked_target() for the range associated to the extent
   map we collected at step 1);

4) Then finally at defrag_one_locked_target() we do a call to
   btrfs_delalloc_reserve_space(), which will reserve data and metadata
   space. If the space reservations can not be satisfied right away, the
   flusher might be kicked in and start flushing delalloc and wait for
   the respective ordered extents to complete. If this happens we will
   deadlock, because both flushing delalloc and finishing an ordered
   extent, requires locking the range in the inode's io tree, which was
   already locked at defrag_collect_targets().

So fix this by skipping extent maps for which there's already delalloc.

Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 550d8f2dfa37..0082e9a60bfc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1211,6 +1211,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		if (em->generation < newer_than)
 			goto next;
 
+		/*
+		 * Our start offset might be in the middle of an existing extent
+		 * map, so take that into account.
+		 */
+		range_len = em->len - (cur - em->start);
+		/*
+		 * If this range of the extent map is already flagged for delalloc,
+		 * skipt it, because:
+		 *
+		 * 1) We could deadlock later, when trying to reserve space for
+		 *    delalloc, because in case we can't immediately reserve space
+		 *    the flusher can start delalloc and wait for the respective
+		 *    ordered extents to complete. The deadlock would happen
+		 *    because we do the space reservation while holding the range
+		 *    locked, and starting writeback, or finishing an ordered
+		 *    extent, requires locking the range;
+		 *
+		 * 2) If there's delalloc there, it means there's dirty pages for
+		 *    which writeback has not started yet (we clean the delalloc
+		 *    flag when starting writeback and after creating an ordered
+		 *    extent). If we mark pages in an adjacent range for defrag,
+		 *    then we will have a larger contiguous range for delalloc,
+		 *    very likely resulting in a larger extent after writeback is
+		 *    triggered (except in a case of free space fragmentation).
+		 */
+		if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
+				   EXTENT_DELALLOC, 0, NULL))
+			goto next;
+
 		/*
 		 * For do_compress case, we want to compress all valid file
 		 * extents, thus no @extent_thresh or mergeable check.
@@ -1219,7 +1248,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 			goto add;
 
 		/* Skip too large extent */
-		if (em->len >= extent_thresh)
+		if (range_len >= extent_thresh)
 			goto next;
 
 		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix deadlock when reserving space during defrag
  2022-01-20 14:27 [PATCH] btrfs: fix deadlock when reserving space during defrag fdmanana
@ 2022-01-20 16:43 ` David Sterba
  2022-01-20 23:39 ` Qu Wenruo
  2022-01-25  9:44 ` Qu Wenruo
  2 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2022-01-20 16:43 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, Jan 20, 2022 at 02:27:56PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When defragging we can end up collecting a range for defrag that has
> already pages under delalloc (dirty), as long as the respective extent
> map for their range is not mapped to a hole, a prealloc extent or
> the extent map is from an old generation.
> 
> Most of the time that is harmless from a functional perspective at
> least, however it can result in a deadlock:
> 
> 1) At defrag_collect_targets() we find an extent map that meets all
>    requirements but there's delalloc for the range it covers, and we add
>    its range to list of ranges to defrag;
> 
> 2) The defrag_collect_targets() function is called at defrag_one_range(),
>    after it locked a range that overlaps the range of the extent map;
> 
> 3) At defrag_one_range(), while the range is still locked, we call
>    defrag_one_locked_target() for the range associated to the extent
>    map we collected at step 1);
> 
> 4) Then finally at defrag_one_locked_target() we do a call to
>    btrfs_delalloc_reserve_space(), which will reserve data and metadata
>    space. If the space reservations can not be satisfied right away, the
>    flusher might be kicked in and start flushing delalloc and wait for
>    the respective ordered extents to complete. If this happens we will
>    deadlock, because both flushing delalloc and finishing an ordered
>    extent, requires locking the range in the inode's io tree, which was
>    already locked at defrag_collect_targets().
> 
> So fix this by skipping extent maps for which there's already delalloc.
> 
> Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Thanks, added to misc-next and to the rest of the defrag fixes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix deadlock when reserving space during defrag
  2022-01-20 14:27 [PATCH] btrfs: fix deadlock when reserving space during defrag fdmanana
  2022-01-20 16:43 ` David Sterba
@ 2022-01-20 23:39 ` Qu Wenruo
  2022-01-21 10:15   ` Filipe Manana
  2022-01-25  9:44 ` Qu Wenruo
  2 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2022-01-20 23:39 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2022/1/20 22:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When defragging we can end up collecting a range for defrag that has
> already pages under delalloc (dirty), as long as the respective extent
> map for their range is not mapped to a hole, a prealloc extent or
> the extent map is from an old generation.
>
> Most of the time that is harmless from a functional perspective at
> least, however it can result in a deadlock:
>
> 1) At defrag_collect_targets() we find an extent map that meets all
>     requirements but there's delalloc for the range it covers, and we add
>     its range to list of ranges to defrag;
>
> 2) The defrag_collect_targets() function is called at defrag_one_range(),
>     after it locked a range that overlaps the range of the extent map;
>
> 3) At defrag_one_range(), while the range is still locked, we call
>     defrag_one_locked_target() for the range associated to the extent
>     map we collected at step 1);
>
> 4) Then finally at defrag_one_locked_target() we do a call to
>     btrfs_delalloc_reserve_space(), which will reserve data and metadata
>     space. If the space reservations can not be satisfied right away, the
>     flusher might be kicked in and start flushing delalloc and wait for
>     the respective ordered extents to complete. If this happens we will
>     deadlock, because both flushing delalloc and finishing an ordered
>     extent, requires locking the range in the inode's io tree, which was
>     already locked at defrag_collect_targets().

That's the part I missed.

>
> So fix this by skipping extent maps for which there's already delalloc.
>
> Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

But I have some concern over one comment below.

> ---
>   fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++-
>   1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 550d8f2dfa37..0082e9a60bfc 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1211,6 +1211,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   		if (em->generation < newer_than)
>   			goto next;
>
> +		/*
> +		 * Our start offset might be in the middle of an existing extent
> +		 * map, so take that into account.
> +		 */
> +		range_len = em->len - (cur - em->start);
> +		/*
> +		 * If this range of the extent map is already flagged for delalloc,
> +		 * skipt it, because:
> +		 *
> +		 * 1) We could deadlock later, when trying to reserve space for
> +		 *    delalloc, because in case we can't immediately reserve space
> +		 *    the flusher can start delalloc and wait for the respective
> +		 *    ordered extents to complete. The deadlock would happen
> +		 *    because we do the space reservation while holding the range
> +		 *    locked, and starting writeback, or finishing an ordered
> +		 *    extent, requires locking the range;
> +		 *
> +		 * 2) If there's delalloc there, it means there's dirty pages for
> +		 *    which writeback has not started yet (we clean the delalloc
> +		 *    flag when starting writeback and after creating an ordered
> +		 *    extent). If we mark pages in an adjacent range for defrag,
> +		 *    then we will have a larger contiguous range for delalloc,
> +		 *    very likely resulting in a larger extent after writeback is
> +		 *    triggered (except in a case of free space fragmentation).

If the page is already under writeback, won't we wait for it in
defrag_one_range() by defrag_prepare_one_page() and
wait_on_page_writeback()?

Thus I think this case is not really possible here.

Thanks,
Qu

> +		 */
> +		if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
> +				   EXTENT_DELALLOC, 0, NULL))
> +			goto next;
> +
>   		/*
>   		 * For do_compress case, we want to compress all valid file
>   		 * extents, thus no @extent_thresh or mergeable check.
> @@ -1219,7 +1248,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   			goto add;
>
>   		/* Skip too large extent */
> -		if (em->len >= extent_thresh)
> +		if (range_len >= extent_thresh)
>   			goto next;
>
>   		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix deadlock when reserving space during defrag
  2022-01-20 23:39 ` Qu Wenruo
@ 2022-01-21 10:15   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2022-01-21 10:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jan 20, 2022 at 11:39 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2022/1/20 22:27, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When defragging we can end up collecting a range for defrag that has
> > already pages under delalloc (dirty), as long as the respective extent
> > map for their range is not mapped to a hole, a prealloc extent or
> > the extent map is from an old generation.
> >
> > Most of the time that is harmless from a functional perspective at
> > least, however it can result in a deadlock:
> >
> > 1) At defrag_collect_targets() we find an extent map that meets all
> >     requirements but there's delalloc for the range it covers, and we add
> >     its range to list of ranges to defrag;
> >
> > 2) The defrag_collect_targets() function is called at defrag_one_range(),
> >     after it locked a range that overlaps the range of the extent map;
> >
> > 3) At defrag_one_range(), while the range is still locked, we call
> >     defrag_one_locked_target() for the range associated to the extent
> >     map we collected at step 1);
> >
> > 4) Then finally at defrag_one_locked_target() we do a call to
> >     btrfs_delalloc_reserve_space(), which will reserve data and metadata
> >     space. If the space reservations can not be satisfied right away, the
> >     flusher might be kicked in and start flushing delalloc and wait for
> >     the respective ordered extents to complete. If this happens we will
> >     deadlock, because both flushing delalloc and finishing an ordered
> >     extent, requires locking the range in the inode's io tree, which was
> >     already locked at defrag_collect_targets().
>
> That's the part I missed.
>
> >
> > So fix this by skipping extent maps for which there's already delalloc.
> >
> > Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> But I have some concern over one comment below.
>
> > ---
> >   fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++-
> >   1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 550d8f2dfa37..0082e9a60bfc 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -1211,6 +1211,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >               if (em->generation < newer_than)
> >                       goto next;
> >
> > +             /*
> > +              * Our start offset might be in the middle of an existing extent
> > +              * map, so take that into account.
> > +              */
> > +             range_len = em->len - (cur - em->start);
> > +             /*
> > +              * If this range of the extent map is already flagged for delalloc,
> > +              * skipt it, because:
> > +              *
> > +              * 1) We could deadlock later, when trying to reserve space for
> > +              *    delalloc, because in case we can't immediately reserve space
> > +              *    the flusher can start delalloc and wait for the respective
> > +              *    ordered extents to complete. The deadlock would happen
> > +              *    because we do the space reservation while holding the range
> > +              *    locked, and starting writeback, or finishing an ordered
> > +              *    extent, requires locking the range;
> > +              *
> > +              * 2) If there's delalloc there, it means there's dirty pages for
> > +              *    which writeback has not started yet (we clean the delalloc
> > +              *    flag when starting writeback and after creating an ordered
> > +              *    extent). If we mark pages in an adjacent range for defrag,
> > +              *    then we will have a larger contiguous range for delalloc,
> > +              *    very likely resulting in a larger extent after writeback is
> > +              *    triggered (except in a case of free space fragmentation).
>
> If the page is already under writeback, won't we wait for it in
> defrag_one_range() by defrag_prepare_one_page() and
> wait_on_page_writeback()?

Yes, we wait.

And? The comment is about ranges that are dirty but not under writeback yet.
Once writeback starts, EXTENT_DELALLOC is removed from the range (an
ordered extent created and a new extent map created).

>
> Thus I think this case is not really possible here.
>
> Thanks,
> Qu
>
> > +              */
> > +             if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
> > +                                EXTENT_DELALLOC, 0, NULL))
> > +                     goto next;
> > +
> >               /*
> >                * For do_compress case, we want to compress all valid file
> >                * extents, thus no @extent_thresh or mergeable check.
> > @@ -1219,7 +1248,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >                       goto add;
> >
> >               /* Skip too large extent */
> > -             if (em->len >= extent_thresh)
> > +             if (range_len >= extent_thresh)
> >                       goto next;
> >
> >               next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix deadlock when reserving space during defrag
  2022-01-20 14:27 [PATCH] btrfs: fix deadlock when reserving space during defrag fdmanana
  2022-01-20 16:43 ` David Sterba
  2022-01-20 23:39 ` Qu Wenruo
@ 2022-01-25  9:44 ` Qu Wenruo
  2022-01-25 10:00   ` Filipe Manana
  2 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2022-01-25  9:44 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2022/1/20 22:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When defragging we can end up collecting a range for defrag that has
> already pages under delalloc (dirty), as long as the respective extent
> map for their range is not mapped to a hole, a prealloc extent or
> the extent map is from an old generation.
>
> Most of the time that is harmless from a functional perspective at
> least, however it can result in a deadlock:
>
> 1) At defrag_collect_targets() we find an extent map that meets all
>     requirements but there's delalloc for the range it covers, and we add
>     its range to list of ranges to defrag;
>
> 2) The defrag_collect_targets() function is called at defrag_one_range(),
>     after it locked a range that overlaps the range of the extent map;
>
> 3) At defrag_one_range(), while the range is still locked, we call
>     defrag_one_locked_target() for the range associated to the extent
>     map we collected at step 1);
>
> 4) Then finally at defrag_one_locked_target() we do a call to
>     btrfs_delalloc_reserve_space(), which will reserve data and metadata
>     space. If the space reservations can not be satisfied right away, the
>     flusher might be kicked in and start flushing delalloc and wait for
>     the respective ordered extents to complete. If this happens we will
>     deadlock, because both flushing delalloc and finishing an ordered
>     extent, requires locking the range in the inode's io tree, which was
>     already locked at defrag_collect_targets().

Could it be possible that we allow btrfs_delalloc_reserve_space() to
choose the flush mode?
Like changing it to NO_FLUSH for this particular call sites?

Thanks,
Qu

>
> So fix this by skipping extent maps for which there's already delalloc.
>
> Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++-
>   1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 550d8f2dfa37..0082e9a60bfc 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1211,6 +1211,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   		if (em->generation < newer_than)
>   			goto next;
>
> +		/*
> +		 * Our start offset might be in the middle of an existing extent
> +		 * map, so take that into account.
> +		 */
> +		range_len = em->len - (cur - em->start);
> +		/*
> +		 * If this range of the extent map is already flagged for delalloc,
> +		 * skipt it, because:
> +		 *
> +		 * 1) We could deadlock later, when trying to reserve space for
> +		 *    delalloc, because in case we can't immediately reserve space
> +		 *    the flusher can start delalloc and wait for the respective
> +		 *    ordered extents to complete. The deadlock would happen
> +		 *    because we do the space reservation while holding the range
> +		 *    locked, and starting writeback, or finishing an ordered
> +		 *    extent, requires locking the range;
> +		 *
> +		 * 2) If there's delalloc there, it means there's dirty pages for
> +		 *    which writeback has not started yet (we clean the delalloc
> +		 *    flag when starting writeback and after creating an ordered
> +		 *    extent). If we mark pages in an adjacent range for defrag,
> +		 *    then we will have a larger contiguous range for delalloc,
> +		 *    very likely resulting in a larger extent after writeback is
> +		 *    triggered (except in a case of free space fragmentation).
> +		 */
> +		if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
> +				   EXTENT_DELALLOC, 0, NULL))
> +			goto next;
> +
>   		/*
>   		 * For do_compress case, we want to compress all valid file
>   		 * extents, thus no @extent_thresh or mergeable check.
> @@ -1219,7 +1248,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>   			goto add;
>
>   		/* Skip too large extent */
> -		if (em->len >= extent_thresh)
> +		if (range_len >= extent_thresh)
>   			goto next;
>
>   		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: fix deadlock when reserving space during defrag
  2022-01-25  9:44 ` Qu Wenruo
@ 2022-01-25 10:00   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2022-01-25 10:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jan 25, 2022 at 9:44 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2022/1/20 22:27, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When defragging we can end up collecting a range for defrag that has
> > already pages under delalloc (dirty), as long as the respective extent
> > map for their range is not mapped to a hole, a prealloc extent or
> > the extent map is from an old generation.
> >
> > Most of the time that is harmless from a functional perspective at
> > least, however it can result in a deadlock:
> >
> > 1) At defrag_collect_targets() we find an extent map that meets all
> >     requirements but there's delalloc for the range it covers, and we add
> >     its range to list of ranges to defrag;
> >
> > 2) The defrag_collect_targets() function is called at defrag_one_range(),
> >     after it locked a range that overlaps the range of the extent map;
> >
> > 3) At defrag_one_range(), while the range is still locked, we call
> >     defrag_one_locked_target() for the range associated to the extent
> >     map we collected at step 1);
> >
> > 4) Then finally at defrag_one_locked_target() we do a call to
> >     btrfs_delalloc_reserve_space(), which will reserve data and metadata
> >     space. If the space reservations can not be satisfied right away, the
> >     flusher might be kicked in and start flushing delalloc and wait for
> >     the respective ordered extents to complete. If this happens we will
> >     deadlock, because both flushing delalloc and finishing an ordered
> >     extent, requires locking the range in the inode's io tree, which was
> >     already locked at defrag_collect_targets().
>
> Could it be possible that we allow btrfs_delalloc_reserve_space() to
> choose the flush mode?
> Like changing it to NO_FLUSH for this particular call sites?

It could, but what do you gain with that?
Why would defrag be so special compared to everything else?

>
> Thanks,
> Qu
>
> >
> > So fix this by skipping extent maps for which there's already delalloc.
> >
> > Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++-
> >   1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 550d8f2dfa37..0082e9a60bfc 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -1211,6 +1211,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >               if (em->generation < newer_than)
> >                       goto next;
> >
> > +             /*
> > +              * Our start offset might be in the middle of an existing extent
> > +              * map, so take that into account.
> > +              */
> > +             range_len = em->len - (cur - em->start);
> > +             /*
> > +              * If this range of the extent map is already flagged for delalloc,
> > +              * skipt it, because:
> > +              *
> > +              * 1) We could deadlock later, when trying to reserve space for
> > +              *    delalloc, because in case we can't immediately reserve space
> > +              *    the flusher can start delalloc and wait for the respective
> > +              *    ordered extents to complete. The deadlock would happen
> > +              *    because we do the space reservation while holding the range
> > +              *    locked, and starting writeback, or finishing an ordered
> > +              *    extent, requires locking the range;
> > +              *
> > +              * 2) If there's delalloc there, it means there's dirty pages for
> > +              *    which writeback has not started yet (we clean the delalloc
> > +              *    flag when starting writeback and after creating an ordered
> > +              *    extent). If we mark pages in an adjacent range for defrag,
> > +              *    then we will have a larger contiguous range for delalloc,
> > +              *    very likely resulting in a larger extent after writeback is
> > +              *    triggered (except in a case of free space fragmentation).
> > +              */
> > +             if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
> > +                                EXTENT_DELALLOC, 0, NULL))
> > +                     goto next;
> > +
> >               /*
> >                * For do_compress case, we want to compress all valid file
> >                * extents, thus no @extent_thresh or mergeable check.
> > @@ -1219,7 +1248,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> >                       goto add;
> >
> >               /* Skip too large extent */
> > -             if (em->len >= extent_thresh)
> > +             if (range_len >= extent_thresh)
> >                       goto next;
> >
> >               next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-25 10:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 14:27 [PATCH] btrfs: fix deadlock when reserving space during defrag fdmanana
2022-01-20 16:43 ` David Sterba
2022-01-20 23:39 ` Qu Wenruo
2022-01-21 10:15   ` Filipe Manana
2022-01-25  9:44 ` Qu Wenruo
2022-01-25 10:00   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).