All of lore.kernel.org
 help / color / mirror / Atom feed
* Is "btrfs balance start" truly asynchronous?
@ 2016-06-20 16:33 Dmitry Katsubo
  2016-06-21  8:55 ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Katsubo @ 2016-06-20 16:33 UTC (permalink / raw)
  To: linux-btrfs

Dear btfs community,

I have added a drive to existing raid1 btrfs volume and decided to 
perform balancing so that data distributes "fairly" among drives. I have 
started "btrfs balance start", but it stalled for about 5-10 minutes 
intensively doing the work. After that time it has printed something 
like "had to relocate 50 chunks" and exited. According to drive I/O, 
"btrfs balance" did most (if not all) of the work, so after it has 
exited the job was done.

Shouldn't "btrfs balance start" do the operation in the background?

Thanks for any information.

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-20 16:33 Is "btrfs balance start" truly asynchronous? Dmitry Katsubo
@ 2016-06-21  8:55 ` Duncan
  2016-06-21 11:24   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 9+ messages in thread
From: Duncan @ 2016-06-21  8:55 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Mon, 20 Jun 2016 18:33:54 +0200 as excerpted:

> Dear btfs community,
> 
> I have added a drive to existing raid1 btrfs volume and decided to
> perform balancing so that data distributes "fairly" among drives. I have
> started "btrfs balance start", but it stalled for about 5-10 minutes
> intensively doing the work. After that time it has printed something
> like "had to relocate 50 chunks" and exited. According to drive I/O,
> "btrfs balance" did most (if not all) of the work, so after it has
> exited the job was done.
> 
> Shouldn't "btrfs balance start" do the operation in the background?

>From the btrfs-balance (8) manpage (from btrfs-progs-4.5.3):

start [options] <path>
    start the balance operation according to the specified filters,
    no filters will rewrite the entire filesystem. The process runs
    in the foreground.


So the balance start operation runs in the foreground, but as explained 
elsewhere in the manpage, the balance is interruptible by unmount and 
will automatically restart after a remount.  It can also be paused and 
resumed or canceled with the appropriate btrfs balance subcommands.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21  8:55 ` Duncan
@ 2016-06-21 11:24   ` Austin S. Hemmelgarn
  2016-06-21 11:33     ` Hugo Mills
  2016-06-21 12:19     ` Zygo Blaxell
  0 siblings, 2 replies; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-21 11:24 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 2016-06-21 04:55, Duncan wrote:
> Dmitry Katsubo posted on Mon, 20 Jun 2016 18:33:54 +0200 as excerpted:
>
>> Dear btfs community,
>>
>> I have added a drive to existing raid1 btrfs volume and decided to
>> perform balancing so that data distributes "fairly" among drives. I have
>> started "btrfs balance start", but it stalled for about 5-10 minutes
>> intensively doing the work. After that time it has printed something
>> like "had to relocate 50 chunks" and exited. According to drive I/O,
>> "btrfs balance" did most (if not all) of the work, so after it has
>> exited the job was done.
>>
>> Shouldn't "btrfs balance start" do the operation in the background?
>
> From the btrfs-balance (8) manpage (from btrfs-progs-4.5.3):
>
> start [options] <path>
>     start the balance operation according to the specified filters,
>     no filters will rewrite the entire filesystem. The process runs
>     in the foreground.
>
>
> So the balance start operation runs in the foreground, but as explained
> elsewhere in the manpage, the balance is interruptible by unmount and
> will automatically restart after a remount.  It can also be paused and
> resumed or canceled with the appropriate btrfs balance subcommands.
>
FWIW, there was some talk a while back about possibly providing an 
option to run balance in the background.  If I end up finding the time, 
I may write a patch for this (userland only, I'm not interested in 
mucking around with the kernel side of things, and it's fully possible 
to do this just using libc functions), as it's something I'd rather like 
to have myself, as the current method of using job control in a shell 
doesn't really work in some circumstances (for example, you can't easily 
start a balance on a remote system via a ssh command, which is the 
specific use case I have).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 11:24   ` Austin S. Hemmelgarn
@ 2016-06-21 11:33     ` Hugo Mills
  2016-06-21 11:51       ` Austin S. Hemmelgarn
  2016-06-21 12:19     ` Zygo Blaxell
  1 sibling, 1 reply; 9+ messages in thread
From: Hugo Mills @ 2016-06-21 11:33 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]

On Tue, Jun 21, 2016 at 07:24:24AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-06-21 04:55, Duncan wrote:
> >Dmitry Katsubo posted on Mon, 20 Jun 2016 18:33:54 +0200 as excerpted:
> >
> >>Dear btfs community,
> >>
> >>I have added a drive to existing raid1 btrfs volume and decided to
> >>perform balancing so that data distributes "fairly" among drives. I have
> >>started "btrfs balance start", but it stalled for about 5-10 minutes
> >>intensively doing the work. After that time it has printed something
> >>like "had to relocate 50 chunks" and exited. According to drive I/O,
> >>"btrfs balance" did most (if not all) of the work, so after it has
> >>exited the job was done.
> >>
> >>Shouldn't "btrfs balance start" do the operation in the background?
> >
> >From the btrfs-balance (8) manpage (from btrfs-progs-4.5.3):
> >
> >start [options] <path>
> >    start the balance operation according to the specified filters,
> >    no filters will rewrite the entire filesystem. The process runs
> >    in the foreground.
> >
> >
> >So the balance start operation runs in the foreground, but as explained
> >elsewhere in the manpage, the balance is interruptible by unmount and
> >will automatically restart after a remount.  It can also be paused and
> >resumed or canceled with the appropriate btrfs balance subcommands.
> >
> FWIW, there was some talk a while back about possibly providing an
> option to run balance in the background.  If I end up finding the
> time, I may write a patch for this (userland only, I'm not
> interested in mucking around with the kernel side of things, and
> it's fully possible to do this just using libc functions), as it's
> something I'd rather like to have myself, as the current method of
> using job control in a shell doesn't really work in some
> circumstances (for example, you can't easily start a balance on a
> remote system via a ssh command, which is the specific use case I
> have).

   There's quite a bit of infrastructure in the userspace tools to
deal with managing an asynchronous scrub. It would probably be worth
looking at that in the first instance to see if it can be reused for
balance.

   Hugo.

-- 
Hugo Mills             |
hugo@... carfax.org.uk | __(_'>
http://carfax.org.uk/  | Squeak!
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 11:33     ` Hugo Mills
@ 2016-06-21 11:51       ` Austin S. Hemmelgarn
  2016-06-21 13:17         ` Graham Cobb
  0 siblings, 1 reply; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-21 11:51 UTC (permalink / raw)
  To: Hugo Mills, Duncan, linux-btrfs

On 2016-06-21 07:33, Hugo Mills wrote:
> On Tue, Jun 21, 2016 at 07:24:24AM -0400, Austin S. Hemmelgarn wrote:
>> On 2016-06-21 04:55, Duncan wrote:
>>> Dmitry Katsubo posted on Mon, 20 Jun 2016 18:33:54 +0200 as excerpted:
>>>
>>>> Dear btfs community,
>>>>
>>>> I have added a drive to existing raid1 btrfs volume and decided to
>>>> perform balancing so that data distributes "fairly" among drives. I have
>>>> started "btrfs balance start", but it stalled for about 5-10 minutes
>>>> intensively doing the work. After that time it has printed something
>>>> like "had to relocate 50 chunks" and exited. According to drive I/O,
>>>> "btrfs balance" did most (if not all) of the work, so after it has
>>>> exited the job was done.
>>>>
>>>> Shouldn't "btrfs balance start" do the operation in the background?
>>>
>> >From the btrfs-balance (8) manpage (from btrfs-progs-4.5.3):
>>>
>>> start [options] <path>
>>>    start the balance operation according to the specified filters,
>>>    no filters will rewrite the entire filesystem. The process runs
>>>    in the foreground.
>>>
>>>
>>> So the balance start operation runs in the foreground, but as explained
>>> elsewhere in the manpage, the balance is interruptible by unmount and
>>> will automatically restart after a remount.  It can also be paused and
>>> resumed or canceled with the appropriate btrfs balance subcommands.
>>>
>> FWIW, there was some talk a while back about possibly providing an
>> option to run balance in the background.  If I end up finding the
>> time, I may write a patch for this (userland only, I'm not
>> interested in mucking around with the kernel side of things, and
>> it's fully possible to do this just using libc functions), as it's
>> something I'd rather like to have myself, as the current method of
>> using job control in a shell doesn't really work in some
>> circumstances (for example, you can't easily start a balance on a
>> remote system via a ssh command, which is the specific use case I
>> have).
>
>    There's quite a bit of infrastructure in the userspace tools to
> deal with managing an asynchronous scrub. It would probably be worth
> looking at that in the first instance to see if it can be reused for
> balance.
Yeah, but we've also already got most of what's needed though for an 
asynchronous balance.  The kernel itself functionally mutexes balances 
(at least, I'm pretty certain it does), we already store state in the 
filesystem itself (so that it can be auto-resumed on remount), and we 
already have commands for pausing, resuming, canceling, and checking 
status.  The only thing that appears to be missing is the ability to 
have the balance backgrounded by the tools themselves instead of needing 
POSIX sh job control or something to daemonize it.

The scrub design works, but the whole state file thing has some rather 
irritating side effects and other implications, and developed out of 
requirements that aren't present for balance (it might be nice to check 
how many chunks actually got balanced after the fact, but it's not 
absolutely necessary).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 11:24   ` Austin S. Hemmelgarn
  2016-06-21 11:33     ` Hugo Mills
@ 2016-06-21 12:19     ` Zygo Blaxell
  1 sibling, 0 replies; 9+ messages in thread
From: Zygo Blaxell @ 2016-06-21 12:19 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

On Tue, Jun 21, 2016 at 07:24:24AM -0400, Austin S. Hemmelgarn wrote:
> (for example, you can't easily start a balance on a remote
> system via a ssh command, which is the specific use case I have).

Wait, what?

	ssh remotehost -n btrfs balance start -d... -m... /foo \&

or

	ssh remotehost -f btrfs balance start -d... -m... /foo

It even works with systemd's auto-kill feature (send btrfs balance all the
SIGKILLs you want, the kernel just ignores them).


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 11:51       ` Austin S. Hemmelgarn
@ 2016-06-21 13:17         ` Graham Cobb
  2016-06-21 13:44           ` Lionel Bouton
  2016-06-21 23:30           ` Dmitry Katsubo
  0 siblings, 2 replies; 9+ messages in thread
From: Graham Cobb @ 2016-06-21 13:17 UTC (permalink / raw)
  To: linux-btrfs

On 21/06/16 12:51, Austin S. Hemmelgarn wrote:
> The scrub design works, but the whole state file thing has some rather
> irritating side effects and other implications, and developed out of
> requirements that aren't present for balance (it might be nice to check
> how many chunks actually got balanced after the fact, but it's not
> absolutely necessary).

Actually, that would be **really** useful.  I have been experimenting
with cancelling balances after a certain time (as part of my
"balance-slowly" script).  I have got it working, just using bash
scripting, but it means my script does not know whether any work has
actually been done by the balance run which was cancelled (if no work
was done, but it timed out anyway, there is probably no point trying
again with the same timeout later!).

Graham




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 13:17         ` Graham Cobb
@ 2016-06-21 13:44           ` Lionel Bouton
  2016-06-21 23:30           ` Dmitry Katsubo
  1 sibling, 0 replies; 9+ messages in thread
From: Lionel Bouton @ 2016-06-21 13:44 UTC (permalink / raw)
  To: Graham Cobb, linux-btrfs

Le 21/06/2016 15:17, Graham Cobb a écrit :
> On 21/06/16 12:51, Austin S. Hemmelgarn wrote:
>> The scrub design works, but the whole state file thing has some rather
>> irritating side effects and other implications, and developed out of
>> requirements that aren't present for balance (it might be nice to check
>> how many chunks actually got balanced after the fact, but it's not
>> absolutely necessary).
> Actually, that would be **really** useful.  I have been experimenting
> with cancelling balances after a certain time (as part of my
> "balance-slowly" script).  I have got it working, just using bash
> scripting, but it means my script does not know whether any work has
> actually been done by the balance run which was cancelled (if no work
> was done, but it timed out anyway, there is probably no point trying
> again with the same timeout later!).

I have the exact same use case.

We trigger balances when we detect that the free space is mostly
allocated but unused to prevent possible ENOSPC events. A balance on
busy disks can slow other I/Os so we try to limit them in time (in our
use case 15 to 30 min max is mostly OK).
Trying to emulate this by using [d|v]range was a possibility too but I
thought it could be hard to get right. We actually inspect the allocated
space before and after to report the difference but we don't know if
this difference is caused by the aborted balance or other activity (we
have to read the kernel logs to find out).

Lionel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is "btrfs balance start" truly asynchronous?
  2016-06-21 13:17         ` Graham Cobb
  2016-06-21 13:44           ` Lionel Bouton
@ 2016-06-21 23:30           ` Dmitry Katsubo
  1 sibling, 0 replies; 9+ messages in thread
From: Dmitry Katsubo @ 2016-06-21 23:30 UTC (permalink / raw)
  To: linux-btrfs

On 2016-06-21 15:17, Graham Cobb wrote:
> On 21/06/16 12:51, Austin S. Hemmelgarn wrote:
>> The scrub design works, but the whole state file thing has some rather
>> irritating side effects and other implications, and developed out of
>> requirements that aren't present for balance (it might be nice to check
>> how many chunks actually got balanced after the fact, but it's not
>> absolutely necessary).
> 
> Actually, that would be **really** useful.  I have been experimenting
> with cancelling balances after a certain time (as part of my
> "balance-slowly" script).  I have got it working, just using bash
> scripting, but it means my script does not know whether any work has
> actually been done by the balance run which was cancelled (if no work
> was done, but it timed out anyway, there is probably no point trying
> again with the same timeout later!).

Additionally it would be nice if balance/scrub reports the status via
/proc in human readable manner (similar to /proc/mdstat).

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-21 23:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 16:33 Is "btrfs balance start" truly asynchronous? Dmitry Katsubo
2016-06-21  8:55 ` Duncan
2016-06-21 11:24   ` Austin S. Hemmelgarn
2016-06-21 11:33     ` Hugo Mills
2016-06-21 11:51       ` Austin S. Hemmelgarn
2016-06-21 13:17         ` Graham Cobb
2016-06-21 13:44           ` Lionel Bouton
2016-06-21 23:30           ` Dmitry Katsubo
2016-06-21 12:19     ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.