[PATCH v3 0/3] btrfs: Make balance cancelling response faster

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] btrfs: Make balance cancelling response faster
@ 2020-02-17  6:16 Qu Wenruo
  2020-02-17  6:16 ` [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Qu Wenruo @ 2020-02-17  6:16 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
There are quite some users reporting that 'btrfs balance cancel' slow to
cancel current running balance, or even doesn't work for certain dead
balance loop.

With the following script showing how long it takes to fully stop a
balance:
  #!/bin/bash
  dev=/dev/test/test
  mnt=/mnt/btrfs

  umount $mnt &> /dev/null
  umount $dev &> /dev/null

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  dd if=/dev/zero bs=1M of=$mnt/large &
  dd_pid=$!

  sleep 3
  kill -KILL $dd_pid
  sync

  btrfs balance start --bg --full $mnt &
  sleep 1

  echo "cancel request" >> /dev/kmsg
  time btrfs balance cancel $mnt
  umount $mnt

It takes around 7~10s to cancel the running balance in my test
environment.

[CAUSE]
Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
request are queued.
However that cancelling request is only checked after relocating a block
group.

That behavior is far from optimal to provide a faster cancelling.

[FIX]
This patchset will add more cancelling check points, to make cancelling
faster.

And also, introduce a new error injection points to cover these newly
introduced and future check points.

For the canceled balance during relocate_block_group(), we are re-using
the existing error handling path.
It will mark all existing reloc_roots as orphan in prepare_to_merge(),
then queue all of them for cleanup in merge_reloc_roots().
Thus it shouldn't cause any problem.

Changelog:
v2:
- Rebased to v5.6-rc1
  There is a small conflicts caused by extra finished stage output.
  Other than that, everything is pretty straightforward

- Add explanation for the error handling path in cover letter.

v3:
- Add comment explaining how cancelling/error handling is done in
  relocate_block_group()
- Remove one patch which is already covered by patch 2 and 3.

Qu Wenruo (3):
  btrfs: relocation: Introduce error injection points for cancelling
    balance
  btrfs: relocation: Check cancel request after each data page read
  btrfs: relocation: Check cancel request after each extent found

 fs/btrfs/ctree.h      |  1 +
 fs/btrfs/relocation.c | 27 +++++++++++++++++++++++++++
 fs/btrfs/volumes.c    |  2 +-
 3 files changed, 29 insertions(+), 1 deletion(-)

-- 
2.25.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance
  2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
@ 2020-02-17  6:16 ` Qu Wenruo
  2020-02-17  9:02   ` Johannes Thumshirn
  2020-02-17  6:16 ` [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2020-02-17  6:16 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new error injection point, should_cancel_balance().

It's just a wrapper of atomic_read(&fs_info->balance_cancel_req), but
allows us to override the return value.

Currently there are only one locations using this function:
- btrfs_balance()
  It checks cancel before each block group.

There are other locations checking fs_info->balance_cancel_req, but they
are not used as an indicator to exit, so there is no need to use the
wrapper.

But there will be more locations coming, and some locations can cause
kernel panic if not handled properly.

So introduce this error injection to provide better test interface.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h      |  1 +
 fs/btrfs/relocation.c | 11 +++++++++++
 fs/btrfs/volumes.c    |  2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 36df977b64d9..d8e12ca15bb6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3401,6 +3401,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
 			      u64 *bytes_to_reserve);
 int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 			      struct btrfs_pending_snapshot *pending);
+int should_cancel_balance(struct btrfs_fs_info *fs_info);
 
 /* scrub.c */
 int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 995d4b8b1cfd..475479d50cc3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -9,6 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/rbtree.h>
 #include <linux/slab.h>
+#include <linux/error-injection.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -3264,6 +3265,16 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
 	return ret;
 }
 
+/*
+ * This wrapper is to allow error injection to fully test all cancel
+ * call sites.
+ */
+int should_cancel_balance(struct btrfs_fs_info *fs_info)
+{
+	return atomic_read(&fs_info->balance_cancel_req);
+}
+ALLOW_ERROR_INJECTION(should_cancel_balance, TRUE);
+
 static int relocate_file_extent_cluster(struct inode *inode,
 					struct file_extent_cluster *cluster)
 {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9cfc668f91f4..cd61f4e29d9e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3904,7 +3904,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 
 	if (btrfs_fs_closing(fs_info) ||
 	    atomic_read(&fs_info->balance_pause_req) ||
-	    atomic_read(&fs_info->balance_cancel_req)) {
+	    should_cancel_balance(fs_info)) {
 		ret = -EINVAL;
 		goto out;
 	}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read
  2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
  2020-02-17  6:16 ` [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance Qu Wenruo
@ 2020-02-17  6:16 ` Qu Wenruo
  2020-02-17  9:03   ` Johannes Thumshirn
  2020-02-17  6:16 ` [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2020-02-17  6:16 UTC (permalink / raw)
  To: linux-btrfs

When relocating a data extents with large large data extents, we spend
most of our time in relocate_file_extent_cluster() at stage "moving data
extents":
 1)               |  btrfs_relocate_block_group [btrfs]() {
 1)               |    relocate_file_extent_cluster [btrfs]() {
 1) $ 6586769 us  |    }
 1) + 18.260 us   |    relocate_file_extent_cluster [btrfs]();
 1) + 15.770 us   |    relocate_file_extent_cluster [btrfs]();
 1) $ 8916340 us  |  }
 1)               |  btrfs_relocate_block_group [btrfs]() {
 1)               |    relocate_file_extent_cluster [btrfs]() {
 1) $ 11611586 us |    }
 1) + 16.930 us   |    relocate_file_extent_cluster [btrfs]();
 1) + 15.870 us   |    relocate_file_extent_cluster [btrfs]();
 1) $ 14986130 us |  }

So to make data relocation cancelling quicker, here add extra balance
cancelling check after each page read in relocate_file_extent_cluster().

Also add a comment explaining how the cancelling/error handling works.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 475479d50cc3..dc97fdd58c3d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3396,6 +3396,10 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 		btrfs_throttle(fs_info);
+		if (should_cancel_balance(fs_info)) {
+			ret = -ECANCELED;
+			goto out;
+		}
 	}
 	WARN_ON(nr != cluster->nr);
 out:
@@ -4208,6 +4212,14 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 	backref_cache_cleanup(&rc->backref_cache);
 	btrfs_block_rsv_release(fs_info, rc->block_rsv, (u64)-1);
 
+	/*
+	 * Even when the relocation is cancelled, we should all go throught
+	 * prepare_to_merge() and merge_reloc_roots().
+	 *
+	 * For error (including cancelled balance), prepare_to_merge() will
+	 * mark all reloc tree orphan, then queue them for cleanup in
+	 * merge_reloc_roots()
+	 */
 	err = prepare_to_merge(rc, err);
 
 	merge_reloc_roots(rc);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found
  2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
  2020-02-17  6:16 ` [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance Qu Wenruo
  2020-02-17  6:16 ` [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read Qu Wenruo
@ 2020-02-17  6:16 ` Qu Wenruo
  2020-02-17  9:04   ` Johannes Thumshirn
  2020-02-17  9:39 ` [PATCH v3 0/3] btrfs: Make balance cancelling response faster Graham Cobb
  2020-03-09 23:25 ` David Sterba
  4 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2020-02-17  6:16 UTC (permalink / raw)
  To: linux-btrfs

When relocating data block groups with tons of small extents, or
large metadata block groups, there can be over 200,000 extents.

We will iterate all extents of such block group in relocate_block_group(),
where iteration itself can be kinda time-consuming.

So when user want to cancel the balance, the extent iteration loop can
be another target.

This patch will add the cancelling check in the extent iteration loop of
relocate_block_group() to make balance cancelling faster.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index dc97fdd58c3d..67433ae476d6 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4181,6 +4181,10 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 				break;
 			}
 		}
+		if (should_cancel_balance(fs_info)) {
+			err = -ECANCELED;
+			break;
+		}
 	}
 	if (trans && progress && err == -ENOSPC) {
 		ret = btrfs_force_chunk_alloc(trans, rc->block_group->flags);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance
  2020-02-17  6:16 ` [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance Qu Wenruo
@ 2020-02-17  9:02   ` Johannes Thumshirn
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2020-02-17  9:02 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read
  2020-02-17  6:16 ` [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read Qu Wenruo
@ 2020-02-17  9:03   ` Johannes Thumshirn
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2020-02-17  9:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found
  2020-02-17  6:16 ` [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found Qu Wenruo
@ 2020-02-17  9:04   ` Johannes Thumshirn
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2020-02-17  9:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/3] btrfs: Make balance cancelling response faster
  2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-02-17  6:16 ` [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found Qu Wenruo
@ 2020-02-17  9:39 ` Graham Cobb
  2020-02-17  9:53   ` Qu Wenruo
  2020-03-09 23:25 ` David Sterba
  4 siblings, 1 reply; 10+ messages in thread
From: Graham Cobb @ 2020-02-17  9:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 17/02/2020 06:16, Qu Wenruo wrote:
...
> [FIX]
> This patchset will add more cancelling check points, to make cancelling
> faster.

I have a question on what this means for users of the balance ioctl.

I *think* that, today, there is a guarantee that if I issue the ioctl to
start a balance, and then immediately (or, some time later) cancel it, I
will be guaranteed that at least one block group will be balanced. In
particular, if I repeat that behaviour, the balance will eventually
complete.

Is that true?

If so, what happens to this guarantee with these changes?

I think, from your description, that the cancel can complete without
even one block group being balanced. Is it guaranteed to have made
progress on that bg so that if I repeat the behaviour that bg (and
eventually all eligible bgs) will be balanced? Or is it now possible to
cancel before any progress has been made sp that process never finishes?

If the latter, how long do I have to wait before cancelling to make sure
that progress has been made? Is there any way to know whether progress
has been made when the ioctl completes with the cancelled status?

This is a real question because I have some filesystems where balancing
a single block group can sometimes take tens of minutes, and the system
is impacted during that time. I already have my btrfs-balance-slowly
script which allows me to control impact by not trying to balance the
next bg for a while, so the system can recover and progress other tasks.
And it would be great to be able to limit the impact further by
cancelling during a single bg, but only if I can be sure that progress
has been made and that by repeating the process I am guaranteed that it
will eventually finish.

In any case, I suggest that the patch cover letter (and maybe code
comments) explains what guarantees (if any) userspace is given.

Graham

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/3] btrfs: Make balance cancelling response faster
  2020-02-17  9:39 ` [PATCH v3 0/3] btrfs: Make balance cancelling response faster Graham Cobb
@ 2020-02-17  9:53   ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2020-02-17  9:53 UTC (permalink / raw)
  To: Graham Cobb, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2463 bytes --]



On 2020/2/17 下午5:39, Graham Cobb wrote:
> On 17/02/2020 06:16, Qu Wenruo wrote:
> ...
>> [FIX]
>> This patchset will add more cancelling check points, to make cancelling
>> faster.
> 
> I have a question on what this means for users of the balance ioctl.
> 
> I *think* that, today, there is a guarantee that if I issue the ioctl to
> start a balance, and then immediately (or, some time later) cancel it, I
> will be guaranteed that at least one block group will be balanced.

Nope, even for the original behavior.

The original check happens before we relocate one block group, so even
for the old behavior, you can still cancel the balance before it even
starts.


> In
> particular, if I repeat that behaviour, the balance will eventually
> complete.
> 
> Is that true?
> 
> If so, what happens to this guarantee with these changes?
> 
> I think, from your description, that the cancel can complete without
> even one block group being balanced. Is it guaranteed to have made
> progress on that bg so that if I repeat the behaviour that bg (and
> eventually all eligible bgs) will be balanced? Or is it now possible to
> cancel before any progress has been made sp that process never finishes?
> 
> If the latter, how long do I have to wait before cancelling to make sure
> that progress has been made? Is there any way to know whether progress
> has been made when the ioctl completes with the cancelled status?

The ioctl itself already has the status recording how many block group
has been relocated, and the bytenr of the last relocated block group.
These behaviors haven't been changed by this patchset.



Thanks,
Qu

> 
> This is a real question because I have some filesystems where balancing
> a single block group can sometimes take tens of minutes, and the system
> is impacted during that time. I already have my btrfs-balance-slowly
> script which allows me to control impact by not trying to balance the
> next bg for a while, so the system can recover and progress other tasks.
> And it would be great to be able to limit the impact further by
> cancelling during a single bg, but only if I can be sure that progress
> has been made and that by repeating the process I am guaranteed that it
> will eventually finish.
> 
> In any case, I suggest that the patch cover letter (and maybe code
> comments) explains what guarantees (if any) userspace is given.
> 
> Graham
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/3] btrfs: Make balance cancelling response faster
  2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-02-17  9:39 ` [PATCH v3 0/3] btrfs: Make balance cancelling response faster Graham Cobb
@ 2020-03-09 23:25 ` David Sterba
  4 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-03-09 23:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Feb 17, 2020 at 02:16:51PM +0800, Qu Wenruo wrote:
> [PROBLEM]
> There are quite some users reporting that 'btrfs balance cancel' slow to
> cancel current running balance, or even doesn't work for certain dead
> balance loop.
> 
> With the following script showing how long it takes to fully stop a
> balance:
> 
> [FIX]
> This patchset will add more cancelling check points, to make cancelling
> faster.

Patches have been in for-next and also used to develop various fixes in
the relocation code, so from that point I think it's ok to move them to
misc-next. The benefit for usability is also clear, so thanks for
working on that. As pointed elsewhere, the commandline UI will need to
be extended to wire the cancelation of resize/deletion to make use of
it, we have time for that for the next cycle.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-09 23:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17  6:16 [PATCH v3 0/3] btrfs: Make balance cancelling response faster Qu Wenruo
2020-02-17  6:16 ` [PATCH v3 1/3] btrfs: relocation: Introduce error injection points for cancelling balance Qu Wenruo
2020-02-17  9:02   ` Johannes Thumshirn
2020-02-17  6:16 ` [PATCH v3 2/3] btrfs: relocation: Check cancel request after each data page read Qu Wenruo
2020-02-17  9:03   ` Johannes Thumshirn
2020-02-17  6:16 ` [PATCH v3 3/3] btrfs: relocation: Check cancel request after each extent found Qu Wenruo
2020-02-17  9:04   ` Johannes Thumshirn
2020-02-17  9:39 ` [PATCH v3 0/3] btrfs: Make balance cancelling response faster Graham Cobb
2020-02-17  9:53   ` Qu Wenruo
2020-03-09 23:25 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.