Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/4] btrfs: Make balance cancelling response faster
@ 2019-12-03  6:24 Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-12-03  6:24 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
There are quite some users reporting that 'btrfs balance cancel' slow to
cancel current running balance, or even doesn't work for certain dead
balance loop.

With the following script showing how long it takes to fully stop a
balance:
  #!/bin/bash
  dev=/dev/test/test
  mnt=/mnt/btrfs

  umount $mnt &> /dev/null
  umount $dev &> /dev/null

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  dd if=/dev/zero bs=1M of=$mnt/large &
  dd_pid=$!

  sleep 3
  kill -KILL $dd_pid
  sync

  btrfs balance start --bg --full $mnt &
  sleep 1

  echo "cancel request" >> /dev/kmsg
  time btrfs balance cancel $mnt
  umount $mnt

It takes around 7~10s to cancel the running balance in my test
environment.

[CAUSE]
Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
request are queued.
However that cancelling request is only checked after relocating a block
group.

That behavior is far from optimal to provide a faster cancelling.

[FIX]
This patchset will add more cancelling check points, to make cancelling
faster.

And also, introduce a new error injection points to cover these newly
introduced and future check points.

Qu Wenruo (4):
  btrfs: relocation: Introduce error injection points for cancelling
    balance
  btrfs: relocation: Check cancel request after each data page read
  btrfs: relocation: Check cancel request after each extent found
  btrfs: relocation: Work around dead relocation stage loop

 fs/btrfs/ctree.h      |  1 +
 fs/btrfs/relocation.c | 23 +++++++++++++++++++++++
 fs/btrfs/volumes.c    |  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] btrfs: Make balance cancelling response faster
  2019-12-05  2:58 ` Zygo Blaxell
@ 2019-12-05  3:26   ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-12-05  3:26 UTC (permalink / raw)
  To: Zygo Blaxell, Qu Wenruo; +Cc: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2432 bytes --]



On 2019/12/5 上午10:58, Zygo Blaxell wrote:
> On Tue, Dec 03, 2019 at 02:42:50PM +0800, Qu Wenruo wrote:
>> [PROBLEM]
>> There are quite some users reporting that 'btrfs balance cancel' slow to
>> cancel current running balance, or even doesn't work for certain dead
>> balance loop.
>>
>> With the following script showing how long it takes to fully stop a
>> balance:
>>   #!/bin/bash
>>   dev=/dev/test/test
>>   mnt=/mnt/btrfs
>>
>>   umount $mnt &> /dev/null
>>   umount $dev &> /dev/null
>>
>>   mkfs.btrfs -f $dev
>>   mount $dev -o nospace_cache $mnt
>>
>>   dd if=/dev/zero bs=1M of=$mnt/large &
>>   dd_pid=$!
>>
>>   sleep 3
>>   kill -KILL $dd_pid
>>   sync
>>
>>   btrfs balance start --bg --full $mnt &
>>   sleep 1
>>
>>   echo "cancel request" >> /dev/kmsg
>>   time btrfs balance cancel $mnt
>>   umount $mnt
>>
>> It takes around 7~10s to cancel the running balance in my test
>> environment.
>>
>> [CAUSE]
>> Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
>> request are queued.
>> However that cancelling request is only checked after relocating a block
>> group.
>>
>> That behavior is far from optimal to provide a faster cancelling.
>>
>> [FIX]
>> This patchset will add more cancelling check points, to make cancelling
>> faster.
> 
> Nice!  I look forward to using this in the future!
> 
> Does this cover device delete/resize as well?

Shrink also takes use of balance, so I see no reason why it won't work
on such use cases.

>  I think there needs to be
> a check added for fatal signals for those to work, as they don't respond
> to balance cancel.

That's a good extra idea.

Since we have that wrapper, it would be easier to add in the future.

Thanks,
Qu

> 
>> And also, introduce a new error injection points to cover these newly
>> introduced and future check points.
>>
>> Qu Wenruo (4):
>>   btrfs: relocation: Introduce error injection points for cancelling
>>     balance
>>   btrfs: relocation: Check cancel request after each data page read
>>   btrfs: relocation: Check cancel request after each extent found
>>   btrfs: relocation: Work around dead relocation stage loop
>>
>>  fs/btrfs/ctree.h      |  1 +
>>  fs/btrfs/relocation.c | 23 +++++++++++++++++++++++
>>  fs/btrfs/volumes.c    |  2 +-
>>  3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> -- 
>> 2.24.0
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] btrfs: Make balance cancelling response faster
  2019-12-03  6:42 Qu Wenruo
  2019-12-04 16:39 ` David Sterba
@ 2019-12-05  2:58 ` Zygo Blaxell
  2019-12-05  3:26   ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Zygo Blaxell @ 2019-12-05  2:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]

On Tue, Dec 03, 2019 at 02:42:50PM +0800, Qu Wenruo wrote:
> [PROBLEM]
> There are quite some users reporting that 'btrfs balance cancel' slow to
> cancel current running balance, or even doesn't work for certain dead
> balance loop.
> 
> With the following script showing how long it takes to fully stop a
> balance:
>   #!/bin/bash
>   dev=/dev/test/test
>   mnt=/mnt/btrfs
> 
>   umount $mnt &> /dev/null
>   umount $dev &> /dev/null
> 
>   mkfs.btrfs -f $dev
>   mount $dev -o nospace_cache $mnt
> 
>   dd if=/dev/zero bs=1M of=$mnt/large &
>   dd_pid=$!
> 
>   sleep 3
>   kill -KILL $dd_pid
>   sync
> 
>   btrfs balance start --bg --full $mnt &
>   sleep 1
> 
>   echo "cancel request" >> /dev/kmsg
>   time btrfs balance cancel $mnt
>   umount $mnt
> 
> It takes around 7~10s to cancel the running balance in my test
> environment.
> 
> [CAUSE]
> Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
> request are queued.
> However that cancelling request is only checked after relocating a block
> group.
> 
> That behavior is far from optimal to provide a faster cancelling.
> 
> [FIX]
> This patchset will add more cancelling check points, to make cancelling
> faster.

Nice!  I look forward to using this in the future!

Does this cover device delete/resize as well?  I think there needs to be
a check added for fatal signals for those to work, as they don't respond
to balance cancel.

> And also, introduce a new error injection points to cover these newly
> introduced and future check points.
> 
> Qu Wenruo (4):
>   btrfs: relocation: Introduce error injection points for cancelling
>     balance
>   btrfs: relocation: Check cancel request after each data page read
>   btrfs: relocation: Check cancel request after each extent found
>   btrfs: relocation: Work around dead relocation stage loop
> 
>  fs/btrfs/ctree.h      |  1 +
>  fs/btrfs/relocation.c | 23 +++++++++++++++++++++++
>  fs/btrfs/volumes.c    |  2 +-
>  3 files changed, 25 insertions(+), 1 deletion(-)
> 
> -- 
> 2.24.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] btrfs: Make balance cancelling response faster
  2019-12-03  6:42 Qu Wenruo
@ 2019-12-04 16:39 ` David Sterba
  2019-12-05  2:58 ` Zygo Blaxell
  1 sibling, 0 replies; 5+ messages in thread
From: David Sterba @ 2019-12-04 16:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Dec 03, 2019 at 02:42:50PM +0800, Qu Wenruo wrote:
> [PROBLEM]
> There are quite some users reporting that 'btrfs balance cancel' slow to
> cancel current running balance, or even doesn't work for certain dead
> balance loop.
> 
> With the following script showing how long it takes to fully stop a
> balance:
>   #!/bin/bash
>   dev=/dev/test/test
>   mnt=/mnt/btrfs
> 
>   umount $mnt &> /dev/null
>   umount $dev &> /dev/null
> 
>   mkfs.btrfs -f $dev
>   mount $dev -o nospace_cache $mnt
> 
>   dd if=/dev/zero bs=1M of=$mnt/large &
>   dd_pid=$!
> 
>   sleep 3
>   kill -KILL $dd_pid
>   sync
> 
>   btrfs balance start --bg --full $mnt &
>   sleep 1
> 
>   echo "cancel request" >> /dev/kmsg
>   time btrfs balance cancel $mnt
>   umount $mnt
> 
> It takes around 7~10s to cancel the running balance in my test
> environment.
> 
> [CAUSE]
> Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
> request are queued.
> However that cancelling request is only checked after relocating a block
> group.

Yes that's the reason why it takes so long to cancel. Adding more
cancellation points is fine, but I don't know what exactly happens when
the block group relocation is not finished. There's code to merge the
reloc inode and commit that, but that's only a high-level view of the
thing.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 0/4] btrfs: Make balance cancelling response faster
@ 2019-12-03  6:42 Qu Wenruo
  2019-12-04 16:39 ` David Sterba
  2019-12-05  2:58 ` Zygo Blaxell
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-12-03  6:42 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]
There are quite some users reporting that 'btrfs balance cancel' slow to
cancel current running balance, or even doesn't work for certain dead
balance loop.

With the following script showing how long it takes to fully stop a
balance:
  #!/bin/bash
  dev=/dev/test/test
  mnt=/mnt/btrfs

  umount $mnt &> /dev/null
  umount $dev &> /dev/null

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  dd if=/dev/zero bs=1M of=$mnt/large &
  dd_pid=$!

  sleep 3
  kill -KILL $dd_pid
  sync

  btrfs balance start --bg --full $mnt &
  sleep 1

  echo "cancel request" >> /dev/kmsg
  time btrfs balance cancel $mnt
  umount $mnt

It takes around 7~10s to cancel the running balance in my test
environment.

[CAUSE]
Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
request are queued.
However that cancelling request is only checked after relocating a block
group.

That behavior is far from optimal to provide a faster cancelling.

[FIX]
This patchset will add more cancelling check points, to make cancelling
faster.

And also, introduce a new error injection points to cover these newly
introduced and future check points.

Qu Wenruo (4):
  btrfs: relocation: Introduce error injection points for cancelling
    balance
  btrfs: relocation: Check cancel request after each data page read
  btrfs: relocation: Check cancel request after each extent found
  btrfs: relocation: Work around dead relocation stage loop

 fs/btrfs/ctree.h      |  1 +
 fs/btrfs/relocation.c | 23 +++++++++++++++++++++++
 fs/btrfs/volumes.c    |  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03  6:24 [PATCH 0/4] btrfs: Make balance cancelling response faster Qu Wenruo
2019-12-03  6:42 Qu Wenruo
2019-12-04 16:39 ` David Sterba
2019-12-05  2:58 ` Zygo Blaxell
2019-12-05  3:26   ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git