All of lore.kernel.org
 help / color / mirror / Atom feed
* Several questions regarding btrfs
@ 2017-10-31 16:23 ST
  2017-10-31 17:45 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: ST @ 2017-10-31 16:23 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've recently learned about btrfs and consider to utilize for my needs.
I have several questions in this regard:

I manage a dedicated server remotely and have some sort of script that
installs an OS from several images. There I can define partitions and
their FSs.

1. By default the script provides a small separate partition for /boot
with ext3. Does it have any advantages or can I simply have /boot
within / all on btrfs? (Note: the OS is Debian9)

2. as for the / I get ca. following written to /etc/fstab:
UUID=blah_blah /dev/sda3 / btrfs ...
So top-level volume is populated after initial installation with the
main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
wiki I would like top-level volume to have only subvolumes (at least,
the one mounted as /) and snapshots. I can make a snapshot of the
top-level volume with / structure, but how can get rid of all the
directories within top-lvl volume and keep only the subvolume
containing / (and later snapshots), unmount it and then mount the
snapshot that I took? rm -rf / - is not a good idea...

3. in my current ext4-based setup I have two servers while one syncs
files of certain dir to the other using lsyncd (which launches rsync on
inotify events). As far as I have understood it is more efficient to use
btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
Do you think it would be possible to make lsyncd to use btrfs for
syncing instead of rsync? I.e. can btrfs work with inotify events? Did
somebody try it already?
Otherwise I can sync using btrfs send/receive from within cron every
10-15 minutes, but it seems less elegant.

4. In a case when compression is used - what quota is based on - (a)
amount of GBs the data actually consumes on the hard drive while in
compressed state or (b) amount of GBs the data naturally is in
uncompressed form. I need to set quotas as in (b). Is it possible? If
not - should I file a feature request?

Thank you in advance!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 16:23 Several questions regarding btrfs ST
@ 2017-10-31 17:45 ` Austin S. Hemmelgarn
  2017-10-31 18:51   ` Andrei Borzenkov
  2017-10-31 20:06   ` ST
  0 siblings, 2 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-10-31 17:45 UTC (permalink / raw)
  To: ST, linux-btrfs

On 2017-10-31 12:23, ST wrote:
> Hello,
> 
> I've recently learned about btrfs and consider to utilize for my needs.
> I have several questions in this regard:
> 
> I manage a dedicated server remotely and have some sort of script that
> installs an OS from several images. There I can define partitions and
> their FSs.
> 
> 1. By default the script provides a small separate partition for /boot
> with ext3. Does it have any advantages or can I simply have /boot
> within / all on btrfs? (Note: the OS is Debian9)
It depends on the boot loader.  I think Debian 9's version of GRUB has 
no issue with BTRFS, but see the response below to your question on 
subvolumes for the one caveat.
> 
> 2. as for the / I get ca. following written to /etc/fstab:
> UUID=blah_blah /dev/sda3 / btrfs ...
> So top-level volume is populated after initial installation with the
> main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
> wiki I would like top-level volume to have only subvolumes (at least,
> the one mounted as /) and snapshots. I can make a snapshot of the
> top-level volume with / structure, but how can get rid of all the
> directories within top-lvl volume and keep only the subvolume
> containing / (and later snapshots), unmount it and then mount the
> snapshot that I took? rm -rf / - is not a good idea...
There are three approaches to doing this, from a live environment, from 
single user mode running with init=/bin/bash, or from systemd emergency 
mode.  Doing it from a live environment is much safer overall, even if 
it does take a bit longer.  I'm listing the last two methods here only 
for completeness, and I very much suggest that you use the first (do it 
from a live environment).

Regardless of which method you use, if you don't have a separate boot 
partition, you will have to create a symlink called /boot outside the 
subvolume, pointing at the boot directory inside the subvolume, or 
change the boot loader to look at the new location for /boot.

 From a live environment, it's pretty simple overall, though it's much 
easier if your live environment matches your distribution:
1. Create the snapshot of the root, naming it what you want the 
subvolume to be called (I usually just call it root, SUSE and Ubuntu 
call it @, others may have different conventions).
2. Delete everything except the snapshot you just created.  The safest 
way to do this is to explicitly list each individual top-level directory 
to delete.
3. Use `btrfs subvolume list` to figure out the subvolume ID for the 
subvolume you just created, and then set that as the default subvolume 
with `btrfs subvolume set-default /path SUBVOLID`.  Once you do this, 
you will need to specify subvolid=5 in the mount options to get the real 
top-level subvolume.
4. Reboot.

For single user mode (check further down for what to do with systemd, 
also note that this may brick your system if you get it wrong):
1. When booting up the system, stop the bootloader and add 
'init=/bin/bash' to the kernel command line before booting.
2. When you get a shell prompt, create the snapshot, just like above.
3. Run the following:
'cd /path ; mkdir old_root ; pivot_root . old_root ; chroot . /bin/bash'
3. You're now running inside the new subvolume, and the old root 
filesystem is mounted at /old_root.  From here, just follow steps 2 to 4 
from the live environment method.

For doing it from emergency mode, things are a bit more complicated:
1. Create the snapshot of the root, just like above.
2. Make sure the only services running are udev and systemd-journald.
3. Run `systemctl switch-root` with the path to the subvolume you just 
created.
4. You're now running inside the new root, systemd _may_ try to go all 
the way to a full boot now.
5. Mount the root filesystem somewhere, and follow steps 2 through 4 of 
the live environment method.
> 
> 3. in my current ext4-based setup I have two servers while one syncs
> files of certain dir to the other using lsyncd (which launches rsync on
> inotify events). As far as I have understood it is more efficient to use
> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
> Do you think it would be possible to make lsyncd to use btrfs for
> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
> somebody try it already?
BTRFS send/receive needs a read-only snapshot to send from.  This means 
that triggering it on inotify events is liable to cause performance 
issues and possibly lose changes (contrary to popular belief, snapshot 
creation is neither atomic nor free).  It also means that if you want to 
match rsync performance in terms of network usage, you're going to have 
to keep the previous snapshot around so you can do an incremental send 
(which is also less efficient than rsync's file comparison, unless rsync 
is checksumming files).

Because of this, it would be pretty complicated right now to get lsyncd 
reliable integration.

> Otherwise I can sync using btrfs send/receive from within cron every
> 10-15 minutes, but it seems less elegant.When it comes to stuff like this, it's usually best to go for the 
simplest solution that meets your requirements.  Unless you need 
real-time synchronization, inotify is overkill, and unless you need to 
copy reflinks (you probably don't, as almost nothing uses them yet, and 
absolutely nothing I know of depends on them) send/receive is overkill.

As a pretty simple example, we've got a couple of systems that have 
near-line active backups set up.  The data is stored on BTRFS, but we 
just use a handful of parallel rsync invocations every 15 minutes to 
keep the backup system in sync (because of what we do, we can afford to 
lose 15 minutes of data).  It's not 'elegant', but it's immediately 
obvious to any seasoned sysadmin what it's doing, and it gets the job 
done easily syncing the data in question in at most a few minutes.  Back 
when I switched to using BTRFS, I considered using send/receive, but 
even using incremental send/receive still performed worse than rsync.
> 
> 4. In a case when compression is used - what quota is based on - (a)
> amount of GBs the data actually consumes on the hard drive while in
> compressed state or (b) amount of GBs the data naturally is in
> uncompressed form. I need to set quotas as in (b). Is it possible? If
> not - should I file a feature request?
I can't directly answer this as I don't know myself (I don't use 
quotas), but have two comments I would suggest you consider:

1. qgroups (the BTRFS quota implementation) cause scaling and 
performance issues.  Unless you absolutely need quotas (unless you're a 
hosting company, or are dealing with users who don't listen and don't 
pay attention to disk usage, you usually do not need quotas), you're 
almost certainly better off disabling them for now, especially for a 
production system.

2. Compression and quotas cause issues regardless of how they interact. 
In case (a), the user has no way of knowing if a given file will fit 
under their quota until they try to create it.  In case (b), actual disk 
usage (as reported by du) will not match up with what the quota says the 
user is using, which makes it harder for them to figure out what to 
delete to free up space.  It's debatable which is a less objectionable 
situation for users, though most people I know tend to think in a way 
that the issue with (a) doesn't matter, but the issue with (b) does.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 17:45 ` Austin S. Hemmelgarn
@ 2017-10-31 18:51   ` Andrei Borzenkov
  2017-10-31 19:07     ` Austin S. Hemmelgarn
  2017-10-31 20:06   ` ST
  1 sibling, 1 reply; 20+ messages in thread
From: Andrei Borzenkov @ 2017-10-31 18:51 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, ST, linux-btrfs

31.10.2017 20:45, Austin S. Hemmelgarn пишет:
> On 2017-10-31 12:23, ST wrote:
>> Hello,
>>
>> I've recently learned about btrfs and consider to utilize for my needs.
>> I have several questions in this regard:
>>
>> I manage a dedicated server remotely and have some sort of script that
>> installs an OS from several images. There I can define partitions and
>> their FSs.
>>
>> 1. By default the script provides a small separate partition for /boot
>> with ext3. Does it have any advantages or can I simply have /boot
>> within / all on btrfs? (Note: the OS is Debian9)
> It depends on the boot loader.  I think Debian 9's version of GRUB has
> no issue with BTRFS, but see the response below to your question on
> subvolumes for the one caveat.
>>
>> 2. as for the / I get ca. following written to /etc/fstab:
>> UUID=blah_blah /dev/sda3 / btrfs ...
>> So top-level volume is populated after initial installation with the
>> main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
>> wiki I would like top-level volume to have only subvolumes (at least,
>> the one mounted as /) and snapshots. I can make a snapshot of the
>> top-level volume with / structure, but how can get rid of all the
>> directories within top-lvl volume and keep only the subvolume
>> containing / (and later snapshots), unmount it and then mount the
>> snapshot that I took? rm -rf / - is not a good idea...
> There are three approaches to doing this, from a live environment, from
> single user mode running with init=/bin/bash, or from systemd emergency
> mode.  Doing it from a live environment is much safer overall, even if
> it does take a bit longer.  I'm listing the last two methods here only
> for completeness, and I very much suggest that you use the first (do it
> from a live environment).
> 
> Regardless of which method you use, if you don't have a separate boot
> partition, you will have to create a symlink called /boot outside the
> subvolume, pointing at the boot directory inside the subvolume, or
> change the boot loader to look at the new location for /boot.
> 
> From a live environment, it's pretty simple overall, though it's much
> easier if your live environment matches your distribution:
> 1. Create the snapshot of the root, naming it what you want the
> subvolume to be called (I usually just call it root, SUSE and Ubuntu
> call it @, others may have different conventions).
> 2. Delete everything except the snapshot you just created.  The safest
> way to do this is to explicitly list each individual top-level directory
> to delete.
> 3. Use `btrfs subvolume list` to figure out the subvolume ID for the
> subvolume you just created, and then set that as the default subvolume
> with `btrfs subvolume set-default /path SUBVOLID`.  Once you do this,
> you will need to specify subvolid=5 in the mount options to get the real
> top-level subvolume.

Note that current grub2 works with absolute paths (relative to
filesystem root). It means that if a) /boot/grub is on btrfs and b) it
is part of snapshot that becomes new root, $prefix (that points to
/boot/grub) in the first-stage grub2 image will be wrong. So to be on
safe side you would want to reinstall grub2 after this change.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 18:51   ` Andrei Borzenkov
@ 2017-10-31 19:07     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-10-31 19:07 UTC (permalink / raw)
  To: Andrei Borzenkov, ST, linux-btrfs

On 2017-10-31 14:51, Andrei Borzenkov wrote:
> 31.10.2017 20:45, Austin S. Hemmelgarn пишет:
>> On 2017-10-31 12:23, ST wrote:
>>> Hello,
>>>
>>> I've recently learned about btrfs and consider to utilize for my needs.
>>> I have several questions in this regard:
>>>
>>> I manage a dedicated server remotely and have some sort of script that
>>> installs an OS from several images. There I can define partitions and
>>> their FSs.
>>>
>>> 1. By default the script provides a small separate partition for /boot
>>> with ext3. Does it have any advantages or can I simply have /boot
>>> within / all on btrfs? (Note: the OS is Debian9)
>> It depends on the boot loader.  I think Debian 9's version of GRUB has
>> no issue with BTRFS, but see the response below to your question on
>> subvolumes for the one caveat.
>>>
>>> 2. as for the / I get ca. following written to /etc/fstab:
>>> UUID=blah_blah /dev/sda3 / btrfs ...
>>> So top-level volume is populated after initial installation with the
>>> main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
>>> wiki I would like top-level volume to have only subvolumes (at least,
>>> the one mounted as /) and snapshots. I can make a snapshot of the
>>> top-level volume with / structure, but how can get rid of all the
>>> directories within top-lvl volume and keep only the subvolume
>>> containing / (and later snapshots), unmount it and then mount the
>>> snapshot that I took? rm -rf / - is not a good idea...
>> There are three approaches to doing this, from a live environment, from
>> single user mode running with init=/bin/bash, or from systemd emergency
>> mode.  Doing it from a live environment is much safer overall, even if
>> it does take a bit longer.  I'm listing the last two methods here only
>> for completeness, and I very much suggest that you use the first (do it
>> from a live environment).
>>
>> Regardless of which method you use, if you don't have a separate boot
>> partition, you will have to create a symlink called /boot outside the
>> subvolume, pointing at the boot directory inside the subvolume, or
>> change the boot loader to look at the new location for /boot.
>>
>>  From a live environment, it's pretty simple overall, though it's much
>> easier if your live environment matches your distribution:
>> 1. Create the snapshot of the root, naming it what you want the
>> subvolume to be called (I usually just call it root, SUSE and Ubuntu
>> call it @, others may have different conventions).
>> 2. Delete everything except the snapshot you just created.  The safest
>> way to do this is to explicitly list each individual top-level directory
>> to delete.
>> 3. Use `btrfs subvolume list` to figure out the subvolume ID for the
>> subvolume you just created, and then set that as the default subvolume
>> with `btrfs subvolume set-default /path SUBVOLID`.  Once you do this,
>> you will need to specify subvolid=5 in the mount options to get the real
>> top-level subvolume.
> 
> Note that current grub2 works with absolute paths (relative to
> filesystem root). It means that if a) /boot/grub is on btrfs and b) it
> is part of snapshot that becomes new root, $prefix (that points to
> /boot/grub) in the first-stage grub2 image will be wrong. So to be on
> safe side you would want to reinstall grub2 after this change.
> 
Generally yes, though you can also make a symlink pointing to the boot 
directory under the new subvolume (snapshot), and things should work 
correctly as far as I know (this works on Gentoo, not sure about other 
distros though).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 17:45 ` Austin S. Hemmelgarn
  2017-10-31 18:51   ` Andrei Borzenkov
@ 2017-10-31 20:06   ` ST
  2017-11-01 12:01     ` Austin S. Hemmelgarn
  2017-11-01 12:15     ` Duncan
  1 sibling, 2 replies; 20+ messages in thread
From: ST @ 2017-10-31 20:06 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

Thank you very much for such an informative response!


On Tue, 2017-10-31 at 13:45 -0400, Austin S. Hemmelgarn wrote:
> On 2017-10-31 12:23, ST wrote:
> > Hello,
> > 
> > I've recently learned about btrfs and consider to utilize for my needs.
> > I have several questions in this regard:
> > 
> > I manage a dedicated server remotely and have some sort of script that
> > installs an OS from several images. There I can define partitions and
> > their FSs.
> > 
> > 1. By default the script provides a small separate partition for /boot
> > with ext3. Does it have any advantages or can I simply have /boot
> > within / all on btrfs? (Note: the OS is Debian9)
> It depends on the boot loader.  I think Debian 9's version of GRUB has 
> no issue with BTRFS, but see the response below to your question on 
> subvolumes for the one caveat.
> > 
> > 2. as for the / I get ca. following written to /etc/fstab:
> > UUID=blah_blah /dev/sda3 / btrfs ...
> > So top-level volume is populated after initial installation with the
> > main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
> > wiki I would like top-level volume to have only subvolumes (at least,
> > the one mounted as /) and snapshots. I can make a snapshot of the
> > top-level volume with / structure, but how can get rid of all the
> > directories within top-lvl volume and keep only the subvolume
> > containing / (and later snapshots), unmount it and then mount the
> > snapshot that I took? rm -rf / - is not a good idea...
> There are three approaches to doing this, from a live environment, from 
> single user mode running with init=/bin/bash, or from systemd emergency 
> mode.  Doing it from a live environment is much safer overall, even if 
> it does take a bit longer.  I'm listing the last two methods here only 
> for completeness, and I very much suggest that you use the first (do it 
> from a live environment).
>
> Regardless of which method you use, if you don't have a separate boot 
> partition, you will have to create a symlink called /boot outside the 
> subvolume, pointing at the boot directory inside the subvolume, or 
> change the boot loader to look at the new location for /boot.
> 
>  From a live environment, it's pretty simple overall, though it's much 
> easier if your live environment matches your distribution:
> 1. Create the snapshot of the root, naming it what you want the 
> subvolume to be called (I usually just call it root, SUSE and Ubuntu 
> call it @, others may have different conventions).
> 2. Delete everything except the snapshot you just created.  The safest 
> way to do this is to explicitly list each individual top-level directory 
> to delete.
> 3. Use `btrfs subvolume list` to figure out the subvolume ID for the 
> subvolume you just created, and then set that as the default subvolume 
> with `btrfs subvolume set-default /path SUBVOLID`.

Do I need to chroot into old_root before doing set-default? Otherwise it
will attempt to set in the live environment, will it not?

Also another questions in this regard - I tried to "set-default" and
then reboot and it worked nice - I landed indeed in the snapshot, not
top-level volume. However /etc/fstab didn't change and actually showed
that top-level volume should have been mounted instead. It seems that
"set-default" has higher precedence than fstab...
1. is it true?
2. how do they actually interact?
3. such a discrepancy disturbs me, so how should I tune fstab to reflect
the change? Or maybe I should not?


>   Once you do this, 
> you will need to specify subvolid=5 in the mount options to get the real 
> top-level subvolume.
> 4. Reboot.
> 
> For single user mode (check further down for what to do with systemd, 
> also note that this may brick your system if you get it wrong):
> 1. When booting up the system, stop the bootloader and add 
> 'init=/bin/bash' to the kernel command line before booting.
> 2. When you get a shell prompt, create the snapshot, just like above.
> 3. Run the following:
> 'cd /path ; mkdir old_root ; pivot_root . old_root ; chroot . /bin/bash'
> 3. You're now running inside the new subvolume, and the old root 
> filesystem is mounted at /old_root.  From here, just follow steps 2 to 4 
> from the live environment method.
> 
> For doing it from emergency mode, things are a bit more complicated:
> 1. Create the snapshot of the root, just like above.
> 2. Make sure the only services running are udev and systemd-journald.
> 3. Run `systemctl switch-root` with the path to the subvolume you just 
> created.
> 4. You're now running inside the new root, systemd _may_ try to go all 
> the way to a full boot now.
> 5. Mount the root filesystem somewhere, and follow steps 2 through 4 of 
> the live environment method.
> > 
> > 3. in my current ext4-based setup I have two servers while one syncs
> > files of certain dir to the other using lsyncd (which launches rsync on
> > inotify events). As far as I have understood it is more efficient to use
> > btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
> > Do you think it would be possible to make lsyncd to use btrfs for
> > syncing instead of rsync? I.e. can btrfs work with inotify events? Did
> > somebody try it already?
> BTRFS send/receive needs a read-only snapshot to send from.  This means 
> that triggering it on inotify events is liable to cause performance 
> issues and possibly lose changes

Actually triggering doesn't happen on each and every inotify event.
lsyncd has an option to define a time interval within which all inotify
events are accumulated and only then rsync is launched. It could be 5-10
seconds or more. Which is quasi real time sync. Do you  still hold that
it will not work with BTRFS send/receive (i.e. keeping previous snapshot
around and creating a new one)?

>  (contrary to popular belief, snapshot 
> creation is neither atomic nor free).  It also means that if you want to 
> match rsync performance in terms of network usage, you're going to have 
> to keep the previous snapshot around so you can do an incremental send 
> (which is also less efficient than rsync's file comparison, unless rsync 
> is checksumming files).

Indeed? From what I've read so far I got an impression that rsync is
slower... but I might be wrong... Is this by design so, or can BTRFS
beat rsync in future (even without checksumming)?


> 
> Because of this, it would be pretty complicated right now to get lsyncd 
> reliable integration.
> 
> > Otherwise I can sync using btrfs send/receive from within cron every
> > 10-15 minutes, but it seems less elegant.When it comes to stuff like this, it's usually best to go for the 
> simplest solution that meets your requirements.  Unless you need 
> real-time synchronization, inotify is overkill,

I actually got inotify-based lsyncd working and I like it... however
real-time syncing is not a must, and several years everything worked
well with a simple rsync within a cron every 15 minutes. Could you
please elaborate on the disadvantages of lsyncd, so maybe I should
switch back? For example, in which of two cases the life of the hard
drive is negatively impacted? On one side the data doesn't change too
often, so 98% of rsync's from cron are wasted, on the other triggering a
rsync on inotify might be too intensive task for a hard drive? What do
you think? What other considerations could be?


>  and unless you need to 
> copy reflinks (you probably don't, as almost nothing uses them yet, and 
> absolutely nothing I know of depends on them) send/receive is overkill.

I saw in a post that rsync would create a separate copy of a cloned file
(consuming double space and maybe traffic?)

> As a pretty simple example, we've got a couple of systems that have 
> near-line active backups set up.  The data is stored on BTRFS, but we 
> just use a handful of parallel rsync invocations every 15 minutes to 
> keep the backup system in sync (because of what we do, we can afford to 
> lose 15 minutes of data).  It's not 'elegant', but it's immediately 
> obvious to any seasoned sysadmin what it's doing, and it gets the job 
> done easily syncing the data in question in at most a few minutes.  Back 
> when I switched to using BTRFS, I considered using send/receive, but 
> even using incremental send/receive still performed worse than rsync.
> > 
> > 4. In a case when compression is used - what quota is based on - (a)
> > amount of GBs the data actually consumes on the hard drive while in
> > compressed state or (b) amount of GBs the data naturally is in
> > uncompressed form. I need to set quotas as in (b). Is it possible? If
> > not - should I file a feature request?
> I can't directly answer this as I don't know myself (I don't use 
> quotas), but have two comments I would suggest you consider:
> 
> 1. qgroups (the BTRFS quota implementation) cause scaling and 
> performance issues.  Unless you absolutely need quotas (unless you're a 
> hosting company, or are dealing with users who don't listen and don't 
> pay attention to disk usage, you usually do not need quotas), you're 
> almost certainly better off disabling them for now, especially for a 
> production system.

Ok. I'll use more standard approaches. Which of following commands will
work with BTRFS:

https://debian-handbook.info/browse/stable/sect.quotas.html


> 
> 2. Compression and quotas cause issues regardless of how they interact. 
> In case (a), the user has no way of knowing if a given file will fit 
> under their quota until they try to create it.  In case (b), actual disk 
> usage (as reported by du) will not match up with what the quota says the 
> user is using, which makes it harder for them to figure out what to 
> delete to free up space.  It's debatable which is a less objectionable 
> situation for users, though most people I know tend to think in a way 
> that the issue with (a) doesn't matter, but the issue with (b) does.

I think both (a) and (b) should be possible and it should be up to
sysadmin to choose what he prefers. The concerns of the (b) scenario
probably could be dealt with some sort of --real-size to the du command,
while by default it could have behavior (which might be emphasized with
--compressed-size).

Two more question came to my mind: as I've mentioned above - I have two
boxes one syncs to another. No RAID involved. I want to scrub (or scan -
don't know yet, what is the difference...) the whole filesystem once in
a month to look for bitrot. Questions:

1. is it a stable setup for production? Let's say I'll sync with rsync -
either in cron or in lsyncd?
2. should any data corruption be discovered - is there any way to heal
it using the copy from the other box over SSH?

Thank you!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 20:06   ` ST
@ 2017-11-01 12:01     ` Austin S. Hemmelgarn
  2017-11-01 14:05       ` ST
  2017-11-01 17:52       ` Andrei Borzenkov
  2017-11-01 12:15     ` Duncan
  1 sibling, 2 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-01 12:01 UTC (permalink / raw)
  To: ST; +Cc: linux-btrfs

On 2017-10-31 16:06, ST wrote:
> Thank you very much for such an informative response!
> 
> 
> On Tue, 2017-10-31 at 13:45 -0400, Austin S. Hemmelgarn wrote:
>> On 2017-10-31 12:23, ST wrote:
>>> Hello,
>>>
>>> I've recently learned about btrfs and consider to utilize for my needs.
>>> I have several questions in this regard:
>>>
>>> I manage a dedicated server remotely and have some sort of script that
>>> installs an OS from several images. There I can define partitions and
>>> their FSs.
>>>
>>> 1. By default the script provides a small separate partition for /boot
>>> with ext3. Does it have any advantages or can I simply have /boot
>>> within / all on btrfs? (Note: the OS is Debian9)
>> It depends on the boot loader.  I think Debian 9's version of GRUB has
>> no issue with BTRFS, but see the response below to your question on
>> subvolumes for the one caveat.
>>>
>>> 2. as for the / I get ca. following written to /etc/fstab:
>>> UUID=blah_blah /dev/sda3 / btrfs ...
>>> So top-level volume is populated after initial installation with the
>>> main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
>>> wiki I would like top-level volume to have only subvolumes (at least,
>>> the one mounted as /) and snapshots. I can make a snapshot of the
>>> top-level volume with / structure, but how can get rid of all the
>>> directories within top-lvl volume and keep only the subvolume
>>> containing / (and later snapshots), unmount it and then mount the
>>> snapshot that I took? rm -rf / - is not a good idea...
>> There are three approaches to doing this, from a live environment, from
>> single user mode running with init=/bin/bash, or from systemd emergency
>> mode.  Doing it from a live environment is much safer overall, even if
>> it does take a bit longer.  I'm listing the last two methods here only
>> for completeness, and I very much suggest that you use the first (do it
>> from a live environment).
>>
>> Regardless of which method you use, if you don't have a separate boot
>> partition, you will have to create a symlink called /boot outside the
>> subvolume, pointing at the boot directory inside the subvolume, or
>> change the boot loader to look at the new location for /boot.
>>
>>   From a live environment, it's pretty simple overall, though it's much
>> easier if your live environment matches your distribution:
>> 1. Create the snapshot of the root, naming it what you want the
>> subvolume to be called (I usually just call it root, SUSE and Ubuntu
>> call it @, others may have different conventions).
>> 2. Delete everything except the snapshot you just created.  The safest
>> way to do this is to explicitly list each individual top-level directory
>> to delete.
>> 3. Use `btrfs subvolume list` to figure out the subvolume ID for the
>> subvolume you just created, and then set that as the default subvolume
>> with `btrfs subvolume set-default /path SUBVOLID`.
> 
> Do I need to chroot into old_root before doing set-default? Otherwise it
> will attempt to set in the live environment, will it not?
The `subvolume set-default` command operates on a filesystem, not an 
environment, since the default subvolume is stored in the filesystem 
itself (it would be kind of pointless otherwise).  The `/path` above 
should be replaced with where you have the filesystem mounted, but it 
doesn't matter what your environment is when you call it (as long as the 
filesystem is mounted of course).
> 
> Also another questions in this regard - I tried to "set-default" and
> then reboot and it worked nice - I landed indeed in the snapshot, not
> top-level volume. However /etc/fstab didn't change and actually showed
> that top-level volume should have been mounted instead. It seems that
> "set-default" has higher precedence than fstab...
> 1. is it true?
> 2. how do they actually interact?
> 3. such a discrepancy disturbs me, so how should I tune fstab to reflect
> the change? Or maybe I should not?
The default subvolume is what gets mounted if you don't specify a 
subvolume to mount.  On a newly created filesystem, it's subvolume ID 5, 
which is the top-level of the filesystem itself.  Debian does not 
specify a subvo9lume in /etc/fstab during the installation, so setting 
the default subvolume will control what gets mounted.  If you were to 
add a 'subvolume=' or 'subvolid=' mount option to /etc/fstab for that 
filesystem, that would override the default subvolume.

The reason I say to set the default subvolume instead of editing 
/etc/fstab is a pretty simple one though.  If you edit /etc/fstab and 
don't set the default subvolume, you will need to mess around with the 
bootloader configuration (and possibly rebuild the initramfs) to make 
the system bootable again, whereas by setting the default subvolume, the 
system will just boot as-is without needing any other configuration changes.
> 
>>    Once you do this,
>> you will need to specify subvolid=5 in the mount options to get the real
>> top-level subvolume.
>> 4. Reboot.
>>
>> For single user mode (check further down for what to do with systemd,
>> also note that this may brick your system if you get it wrong):
>> 1. When booting up the system, stop the bootloader and add
>> 'init=/bin/bash' to the kernel command line before booting.
>> 2. When you get a shell prompt, create the snapshot, just like above.
>> 3. Run the following:
>> 'cd /path ; mkdir old_root ; pivot_root . old_root ; chroot . /bin/bash'
>> 3. You're now running inside the new subvolume, and the old root
>> filesystem is mounted at /old_root.  From here, just follow steps 2 to 4
>> from the live environment method.
>>
>> For doing it from emergency mode, things are a bit more complicated:
>> 1. Create the snapshot of the root, just like above.
>> 2. Make sure the only services running are udev and systemd-journald.
>> 3. Run `systemctl switch-root` with the path to the subvolume you just
>> created.
>> 4. You're now running inside the new root, systemd _may_ try to go all
>> the way to a full boot now.
>> 5. Mount the root filesystem somewhere, and follow steps 2 through 4 of
>> the live environment method.
>>>
>>> 3. in my current ext4-based setup I have two servers while one syncs
>>> files of certain dir to the other using lsyncd (which launches rsync on
>>> inotify events). As far as I have understood it is more efficient to use
>>> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
>>> Do you think it would be possible to make lsyncd to use btrfs for
>>> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
>>> somebody try it already?
>> BTRFS send/receive needs a read-only snapshot to send from.  This means
>> that triggering it on inotify events is liable to cause performance
>> issues and possibly lose changes
> 
> Actually triggering doesn't happen on each and every inotify event.
> lsyncd has an option to define a time interval within which all inotify
> events are accumulated and only then rsync is launched. It could be 5-10
> seconds or more. Which is quasi real time sync. Do you  still hold that
> it will not work with BTRFS send/receive (i.e. keeping previous snapshot
> around and creating a new one)?
Okay, I actually didn't know that.  Depending on how lsyncd invokes 
rsync though (does it call out rsync with the exact paths or just on the 
whole directory?), it may still be less efficient to use BTRFS send/receive.
> 
>>   (contrary to popular belief, snapshot
>> creation is neither atomic nor free).  It also means that if you want to
>> match rsync performance in terms of network usage, you're going to have
>> to keep the previous snapshot around so you can do an incremental send
>> (which is also less efficient than rsync's file comparison, unless rsync
>> is checksumming files).
> 
> Indeed? From what I've read so far I got an impression that rsync is
> slower... but I might be wrong... Is this by design so, or can BTRFS
> beat rsync in future (even without checksumming)?
It really depends.  BTRFS send/receive transfers _everything_, period. 
Any xattrs, any ACL's, any other metadata, everything.  Rsync can 
optionally not transfer some of that data (and by default doesn't), so 
if you don't need all of that (and most people don't need xattrs or 
ACL's transferred), rsync is usually going to be faster.  When you 
actually are transferring everything, send/receive is probably faster, 
and it's definitely faster than rsync with checksumming.

There's one other issue at hand though that I had forgot to mention. 
The current implementation of send/receive doesn't properly validate 
sources for reflinks, which means it's possible to create an information 
leak with a carefully crafted send stream and some pretty minimal 
knowledge of the destination filesystem.  Whether or not this matters is 
of course specific to your use case.
> 
>>
>> Because of this, it would be pretty complicated right now to get lsyncd
>> reliable integration.
>>
>>> Otherwise I can sync using btrfs send/receive from within cron every
>>> 10-15 minutes, but it seems less elegant.When it comes to stuff like this, it's usually best to go for the
>> simplest solution that meets your requirements.  Unless you need
>> real-time synchronization, inotify is overkill,
> 
> I actually got inotify-based lsyncd working and I like it... however
> real-time syncing is not a must, and several years everything worked
> well with a simple rsync within a cron every 15 minutes. Could you
> please elaborate on the disadvantages of lsyncd, so maybe I should
> switch back? For example, in which of two cases the life of the hard
> drive is negatively impacted? On one side the data doesn't change too
> often, so 98% of rsync's from cron are wasted, on the other triggering a
> rsync on inotify might be too intensive task for a hard drive? What do
> you think? What other considerations could be?
The biggest one is largely irrelevant if lsyncd batches transfers, and 
arises from the possibility of events firing faster than you can handle 
them (which runs the risk of events getting lost, and in turn things 
getting out of sync).  The other big one (for me at least) is 
determinism.  With a cron job, you know exactly when things will get 
copied, and in turn exactly when the system will potentially be under 
increased load (which makes it a lot easier to quickly explain to users 
why whatever they were doing unexpectedly took longer than normal).
> 
> 
>>   and unless you need to
>> copy reflinks (you probably don't, as almost nothing uses them yet, and
>> absolutely nothing I know of depends on them) send/receive is overkill.
> 
> I saw in a post that rsync would create a separate copy of a cloned file
> (consuming double space and maybe traffic?)
That's correct, but you technically need to have that extra space in 
most cases anyway, since you can't assume nothing will write to that 
file and double the space usage.
> 
>> As a pretty simple example, we've got a couple of systems that have
>> near-line active backups set up.  The data is stored on BTRFS, but we
>> just use a handful of parallel rsync invocations every 15 minutes to
>> keep the backup system in sync (because of what we do, we can afford to
>> lose 15 minutes of data).  It's not 'elegant', but it's immediately
>> obvious to any seasoned sysadmin what it's doing, and it gets the job
>> done easily syncing the data in question in at most a few minutes.  Back
>> when I switched to using BTRFS, I considered using send/receive, but
>> even using incremental send/receive still performed worse than rsync.
>>>
>>> 4. In a case when compression is used - what quota is based on - (a)
>>> amount of GBs the data actually consumes on the hard drive while in
>>> compressed state or (b) amount of GBs the data naturally is in
>>> uncompressed form. I need to set quotas as in (b). Is it possible? If
>>> not - should I file a feature request?
>> I can't directly answer this as I don't know myself (I don't use
>> quotas), but have two comments I would suggest you consider:
>>
>> 1. qgroups (the BTRFS quota implementation) cause scaling and
>> performance issues.  Unless you absolutely need quotas (unless you're a
>> hosting company, or are dealing with users who don't listen and don't
>> pay attention to disk usage, you usually do not need quotas), you're
>> almost certainly better off disabling them for now, especially for a
>> production system.
> 
> Ok. I'll use more standard approaches. Which of following commands will
> work with BTRFS:
> 
> https://debian-handbook.info/browse/stable/sect.quotas.html
None, qgroups are the only option right now with BTRFS, and it's pretty 
likely to stay that way since the internals of the filesystem don't fit 
well within the semantics of the regular VFS quota API.  However, 
provided you're not using huge numbers of reflinks and subvolumes, you 
should be fine using qgroups.

However, it's important to know that if your users have shell access, 
they can bypass qgroups.  Normal users can create subvolumes, and new 
subvolumes aren't added to an existing qgroup by default (and unless I'm 
mistaken, aren't constrained by the qgroup set on the parent subvolume), 
so simple shell access is enough to bypass quotas.
> 
>>
>> 2. Compression and quotas cause issues regardless of how they interact.
>> In case (a), the user has no way of knowing if a given file will fit
>> under their quota until they try to create it.  In case (b), actual disk
>> usage (as reported by du) will not match up with what the quota says the
>> user is using, which makes it harder for them to figure out what to
>> delete to free up space.  It's debatable which is a less objectionable
>> situation for users, though most people I know tend to think in a way
>> that the issue with (a) doesn't matter, but the issue with (b) does.
> 
> I think both (a) and (b) should be possible and it should be up to
> sysadmin to choose what he prefers. The concerns of the (b) scenario
> probably could be dealt with some sort of --real-size to the du command,
> while by default it could have behavior (which might be emphasized with
> --compressed-size).
Reporting anything but the compressed size by default in du would mean 
it doesn't behave as existing software expect it to.  It's supposed to 
report actual disk usage (in contrast to the sum of file sizes), which 
means for example that a 1G sparse file with only 64k of data is 
supposed to be reported as being 64k by du.
> 
> Two more question came to my mind: as I've mentioned above - I have two
> boxes one syncs to another. No RAID involved. I want to scrub (or scan -
> don't know yet, what is the difference...) the whole filesystem once in
> a month to look for bitrot. Questions:
> 
> 1. is it a stable setup for production? Let's say I'll sync with rsync -
> either in cron or in lsyncd?
Reasonably, though depending on how much data and other environmental 
constraints, you may want to scrub a bit more frequently.
> 2. should any data corruption be discovered - is there any way to heal
> it using the copy from the other box over SSH?
Provided you know which file is affected, yes, you can fix it by just 
copying the file back from the other system.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 20:06   ` ST
  2017-11-01 12:01     ` Austin S. Hemmelgarn
@ 2017-11-01 12:15     ` Duncan
  1 sibling, 0 replies; 20+ messages in thread
From: Duncan @ 2017-11-01 12:15 UTC (permalink / raw)
  To: linux-btrfs

ST posted on Tue, 31 Oct 2017 22:06:24 +0200 as excerpted:

> Also another questions in this regard - I tried to "set-default" and
> then reboot and it worked nice - I landed indeed in the snapshot, not
> top-level volume. However /etc/fstab didn't change and actually showed
> that top-level volume should have been mounted instead. It seems that
> "set-default" has higher precedence than fstab...
> 1. is it true?
> 2. how do they actually interact?
> 3. such a discrepancy disturbs me, so how should I tune fstab to reflect
> the change? Or maybe I should not?

For most distros, for root, the /etc/fstab entry is a dummy of sorts.  
The kernel must have the information for root before it can read
/etc/fstab, and it's usually either fed to the kernel on the kernel 
commandline (via root=, rootfstype= and rootflags=) or configured in the 
initr*, tho those may be indirectly sourced from /etc/fstab via scripts 
that set them up, and there's a kernel default that applies without a 
configured commandline, that distros may setup for their own defaults.

The /etc/fstab entry may be used when remounting root writable, as it's 
normally mounted read-only first and only remounted writable later, but 
some distros may either do that without reading the fstab entry as well, 
or be configured to leave root mounted read-only (as I've configured my 
system here, on gentoo).


So presumably whatever's actually being used by your kernel to find the 
root to mount, the commandline, the initr*, or the configured kernel 
defaults, doesn't have a specific subvolume option and (for btrfs), is 
simply depending on the btrfs default subvolume being pointed at the 
right subvolume.  As such, configuring btrfs to point at a different 
subvolume "just works", since it's just using the filesystem default 
subvolume in the first place.

Which should work fine as long as whatever configured default subvolume 
ends up having a valid root configuration.  I'd thus be most worried 
about testing that you can point it at whatever you are using as a backup 
and/or emergency boot and maintenance image, and successfully boot from 
that, should the default subvolume get screwed up and become unbootable 
for whatever reason.  Of course that'll require being able to either know 
where the kernel is getting its root information in ordered to change it, 
or at minimum, being able to successfully override it with a higher 
priority config, when necessary.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 12:01     ` Austin S. Hemmelgarn
@ 2017-11-01 14:05       ` ST
  2017-11-01 15:31         ` Lukas Pirl
  2017-11-01 17:20         ` Austin S. Hemmelgarn
  2017-11-01 17:52       ` Andrei Borzenkov
  1 sibling, 2 replies; 20+ messages in thread
From: ST @ 2017-11-01 14:05 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs


> >>> 3. in my current ext4-based setup I have two servers while one syncs
> >>> files of certain dir to the other using lsyncd (which launches rsync on
> >>> inotify events). As far as I have understood it is more efficient to use
> >>> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
> >>> Do you think it would be possible to make lsyncd to use btrfs for
> >>> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
> >>> somebody try it already?
> >> BTRFS send/receive needs a read-only snapshot to send from.  This means
> >> that triggering it on inotify events is liable to cause performance
> >> issues and possibly lose changes
> > 
> > Actually triggering doesn't happen on each and every inotify event.
> > lsyncd has an option to define a time interval within which all inotify
> > events are accumulated and only then rsync is launched. It could be 5-10
> > seconds or more. Which is quasi real time sync. Do you  still hold that
> > it will not work with BTRFS send/receive (i.e. keeping previous snapshot
> > around and creating a new one)?
> Okay, I actually didn't know that.  Depending on how lsyncd invokes 
> rsync though (does it call out rsync with the exact paths or just on the 
> whole directory?), it may still be less efficient to use BTRFS send/receive.

I assume on the whole directory, but I'm not sure...

> >>> 4. In a case when compression is used - what quota is based on - (a)
> >>> amount of GBs the data actually consumes on the hard drive while in
> >>> compressed state or (b) amount of GBs the data naturally is in
> >>> uncompressed form. I need to set quotas as in (b). Is it possible? If
> >>> not - should I file a feature request?
> >> I can't directly answer this as I don't know myself (I don't use
> >> quotas), but have two comments I would suggest you consider:
> >>
> >> 1. qgroups (the BTRFS quota implementation) cause scaling and
> >> performance issues.  Unless you absolutely need quotas (unless you're a
> >> hosting company, or are dealing with users who don't listen and don't
> >> pay attention to disk usage, you usually do not need quotas), you're
> >> almost certainly better off disabling them for now, especially for a
> >> production system.
> > 
> > Ok. I'll use more standard approaches. Which of following commands will
> > work with BTRFS:
> > 
> > https://debian-handbook.info/browse/stable/sect.quotas.html
> None, qgroups are the only option right now with BTRFS, and it's pretty 
> likely to stay that way since the internals of the filesystem don't fit 
> well within the semantics of the regular VFS quota API.  However, 
> provided you're not using huge numbers of reflinks and subvolumes, you 
> should be fine using qgroups.

I want to have 7 daily (or 7+4) read-only snapshots per user, for ca.
100 users. I don't expect users to invoke cp --reflink or take
snapshots.

> 
> However, it's important to know that if your users have shell access, 
> they can bypass qgroups.  Normal users can create subvolumes, and new 
> subvolumes aren't added to an existing qgroup by default (and unless I'm 
> mistaken, aren't constrained by the qgroup set on the parent subvolume), 
> so simple shell access is enough to bypass quotas.

I never did it before, but shouldn't it be possible to just whitelist
commands users are allowed to use in the SSH config (and so block
creation of subvolumes/cp --reflink)? I actually would have restricted
users to sftp if I knew how to let them change their passwords once they
wish to. As far as I know it is not possible with OpenSSH...


> >>
> >> 2. Compression and quotas cause issues regardless of how they interact.
> >> In case (a), the user has no way of knowing if a given file will fit
> >> under their quota until they try to create it.  In case (b), actual disk
> >> usage (as reported by du) will not match up with what the quota says the
> >> user is using, which makes it harder for them to figure out what to
> >> delete to free up space.  It's debatable which is a less objectionable
> >> situation for users, though most people I know tend to think in a way
> >> that the issue with (a) doesn't matter, but the issue with (b) does.
> > 
> > I think both (a) and (b) should be possible and it should be up to
> > sysadmin to choose what he prefers. The concerns of the (b) scenario
> > probably could be dealt with some sort of --real-size to the du command,
> > while by default it could have behavior (which might be emphasized with
> > --compressed-size).
> Reporting anything but the compressed size by default in du would mean 
> it doesn't behave as existing software expect it to.  It's supposed to 
> report actual disk usage (in contrast to the sum of file sizes), which 
> means for example that a 1G sparse file with only 64k of data is 
> supposed to be reported as being 64k by du.

Yes, it shouldn't be default behavior, but an optional one...

> > Two more question came to my mind: as I've mentioned above - I have two
> > boxes one syncs to another. No RAID involved. I want to scrub (or scan -
> > don't know yet, what is the difference...) the whole filesystem once in
> > a month to look for bitrot. Questions:
> > 
> > 1. is it a stable setup for production? Let's say I'll sync with rsync -
> > either in cron or in lsyncd?
> Reasonably, though depending on how much data and other environmental 
> constraints, you may want to scrub a bit more frequently.
> > 2. should any data corruption be discovered - is there any way to heal
> > it using the copy from the other box over SSH?
> Provided you know which file is affected, yes, you can fix it by just 
> copying the file back from the other system.
Ok, but there is no automatic fixing in such a case, right?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 14:05       ` ST
@ 2017-11-01 15:31         ` Lukas Pirl
  2017-11-01 17:20         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 20+ messages in thread
From: Lukas Pirl @ 2017-11-01 15:31 UTC (permalink / raw)
  To: ST; +Cc: linux-btrfs

On 11/01/2017 03:05 PM, ST wrote as excerpted:
>> However, it's important to know that if your users have shell access, 
>> they can bypass qgroups.  Normal users can create subvolumes, and new 
>> subvolumes aren't added to an existing qgroup by default (and unless I'm 
>> mistaken, aren't constrained by the qgroup set on the parent subvolume), 
>> so simple shell access is enough to bypass quotas.

> I never did it before, but shouldn't it be possible to just whitelist
> commands users are allowed to use in the SSH config (and so block
> creation of subvolumes/cp --reflink)? I actually would have restricted
> users to sftp if I knew how to let them change their passwords once they
> wish to. As far as I know it is not possible with OpenSSH...

Possible only via a rather custom setup, I guess. You could
a) force users into a chroot via the sshd configuration
   (chroots need allowed binaries plus their libs and configs etc.),
b) solve the problem with file permissions on all binaries
   (probably a terrible pain to setup (users, groups, …) and maintain)

Cheers,

Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 14:05       ` ST
  2017-11-01 15:31         ` Lukas Pirl
@ 2017-11-01 17:20         ` Austin S. Hemmelgarn
  2017-11-02  9:09           ` ST
  1 sibling, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-01 17:20 UTC (permalink / raw)
  To: ST; +Cc: linux-btrfs

On 2017-11-01 10:05, ST wrote:
> 
>>>>> 3. in my current ext4-based setup I have two servers while one syncs
>>>>> files of certain dir to the other using lsyncd (which launches rsync on
>>>>> inotify events). As far as I have understood it is more efficient to use
>>>>> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
>>>>> Do you think it would be possible to make lsyncd to use btrfs for
>>>>> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
>>>>> somebody try it already?
>>>> BTRFS send/receive needs a read-only snapshot to send from.  This means
>>>> that triggering it on inotify events is liable to cause performance
>>>> issues and possibly lose changes
>>>
>>> Actually triggering doesn't happen on each and every inotify event.
>>> lsyncd has an option to define a time interval within which all inotify
>>> events are accumulated and only then rsync is launched. It could be 5-10
>>> seconds or more. Which is quasi real time sync. Do you  still hold that
>>> it will not work with BTRFS send/receive (i.e. keeping previous snapshot
>>> around and creating a new one)?
>> Okay, I actually didn't know that.  Depending on how lsyncd invokes
>> rsync though (does it call out rsync with the exact paths or just on the
>> whole directory?), it may still be less efficient to use BTRFS send/receive.
> 
> I assume on the whole directory, but I'm not sure...
> 
>>>>> 4. In a case when compression is used - what quota is based on - (a)
>>>>> amount of GBs the data actually consumes on the hard drive while in
>>>>> compressed state or (b) amount of GBs the data naturally is in
>>>>> uncompressed form. I need to set quotas as in (b). Is it possible? If
>>>>> not - should I file a feature request?
>>>> I can't directly answer this as I don't know myself (I don't use
>>>> quotas), but have two comments I would suggest you consider:
>>>>
>>>> 1. qgroups (the BTRFS quota implementation) cause scaling and
>>>> performance issues.  Unless you absolutely need quotas (unless you're a
>>>> hosting company, or are dealing with users who don't listen and don't
>>>> pay attention to disk usage, you usually do not need quotas), you're
>>>> almost certainly better off disabling them for now, especially for a
>>>> production system.
>>>
>>> Ok. I'll use more standard approaches. Which of following commands will
>>> work with BTRFS:
>>>
>>> https://debian-handbook.info/browse/stable/sect.quotas.html
>> None, qgroups are the only option right now with BTRFS, and it's pretty
>> likely to stay that way since the internals of the filesystem don't fit
>> well within the semantics of the regular VFS quota API.  However,
>> provided you're not using huge numbers of reflinks and subvolumes, you
>> should be fine using qgroups.
> 
> I want to have 7 daily (or 7+4) read-only snapshots per user, for ca.
> 100 users. I don't expect users to invoke cp --reflink or take
> snapshots.
Based on what you say below about user access, you should be absolutely 
fine then.

There's one other caveat though, only root can use the qgroup ioctls, 
which means that only root can check quotas.
> 
>>
>> However, it's important to know that if your users have shell access,
>> they can bypass qgroups.  Normal users can create subvolumes, and new
>> subvolumes aren't added to an existing qgroup by default (and unless I'm
>> mistaken, aren't constrained by the qgroup set on the parent subvolume),
>> so simple shell access is enough to bypass quotas.
> 
> I never did it before, but shouldn't it be possible to just whitelist
> commands users are allowed to use in the SSH config (and so block
> creation of subvolumes/cp --reflink)? I actually would have restricted
> users to sftp if I knew how to let them change their passwords once they
> wish to. As far as I know it is not possible with OpenSSH...
Yes, but not with OpenSSH.  Assuming you just want SFTP/SCP, and the 
ability to change passwords, you can use a program called 'scponly' [1]. 
  It's a replacement shell that only allows the things needed for a very 
small set of commands, and it includes support for restricting things to 
just SCP/SFTP, and the passwd command.
> 
> 
>>>>
>>>> 2. Compression and quotas cause issues regardless of how they interact.
>>>> In case (a), the user has no way of knowing if a given file will fit
>>>> under their quota until they try to create it.  In case (b), actual disk
>>>> usage (as reported by du) will not match up with what the quota says the
>>>> user is using, which makes it harder for them to figure out what to
>>>> delete to free up space.  It's debatable which is a less objectionable
>>>> situation for users, though most people I know tend to think in a way
>>>> that the issue with (a) doesn't matter, but the issue with (b) does.
>>>
>>> I think both (a) and (b) should be possible and it should be up to
>>> sysadmin to choose what he prefers. The concerns of the (b) scenario
>>> probably could be dealt with some sort of --real-size to the du command,
>>> while by default it could have behavior (which might be emphasized with
>>> --compressed-size).
>> Reporting anything but the compressed size by default in du would mean
>> it doesn't behave as existing software expect it to.  It's supposed to
>> report actual disk usage (in contrast to the sum of file sizes), which
>> means for example that a 1G sparse file with only 64k of data is
>> supposed to be reported as being 64k by du.
> 
> Yes, it shouldn't be default behavior, but an optional one...
> 
>>> Two more question came to my mind: as I've mentioned above - I have two
>>> boxes one syncs to another. No RAID involved. I want to scrub (or scan -
>>> don't know yet, what is the difference...) the whole filesystem once in
>>> a month to look for bitrot. Questions:
>>>
>>> 1. is it a stable setup for production? Let's say I'll sync with rsync -
>>> either in cron or in lsyncd?
>> Reasonably, though depending on how much data and other environmental
>> constraints, you may want to scrub a bit more frequently.
>>> 2. should any data corruption be discovered - is there any way to heal
>>> it using the copy from the other box over SSH?
>> Provided you know which file is affected, yes, you can fix it by just
>> copying the file back from the other system.
> Ok, but there is no automatic fixing in such a case, right?
Correct.


[1] https://github.com/scponly/scponly/wiki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 12:01     ` Austin S. Hemmelgarn
  2017-11-01 14:05       ` ST
@ 2017-11-01 17:52       ` Andrei Borzenkov
  2017-11-01 18:28         ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 20+ messages in thread
From: Andrei Borzenkov @ 2017-11-01 17:52 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, ST; +Cc: linux-btrfs

01.11.2017 15:01, Austin S. Hemmelgarn пишет:
...
> The default subvolume is what gets mounted if you don't specify a
> subvolume to mount.  On a newly created filesystem, it's subvolume ID 5,
> which is the top-level of the filesystem itself.  Debian does not
> specify a subvo9lume in /etc/fstab during the installation, so setting
> the default subvolume will control what gets mounted.  If you were to
> add a 'subvolume=' or 'subvolid=' mount option to /etc/fstab for that
> filesystem, that would override the default subvolume.
> 
> The reason I say to set the default subvolume instead of editing
> /etc/fstab is a pretty simple one though.  If you edit /etc/fstab and
> don't set the default subvolume, you will need to mess around with the
> bootloader configuration (and possibly rebuild the initramfs) to make
> the system bootable again, whereas by setting the default subvolume, the
> system will just boot as-is without needing any other configuration
> changes.

That breaks as soon as you have nested subvolumes that are not
explicitly mounted because they are lost in new snapshot.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 17:52       ` Andrei Borzenkov
@ 2017-11-01 18:28         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-01 18:28 UTC (permalink / raw)
  To: Andrei Borzenkov, ST; +Cc: linux-btrfs

On 2017-11-01 13:52, Andrei Borzenkov wrote:
> 01.11.2017 15:01, Austin S. Hemmelgarn пишет:
> ...
>> The default subvolume is what gets mounted if you don't specify a
>> subvolume to mount.  On a newly created filesystem, it's subvolume ID 5,
>> which is the top-level of the filesystem itself.  Debian does not
>> specify a subvo9lume in /etc/fstab during the installation, so setting
>> the default subvolume will control what gets mounted.  If you were to
>> add a 'subvolume=' or 'subvolid=' mount option to /etc/fstab for that
>> filesystem, that would override the default subvolume.
>>
>> The reason I say to set the default subvolume instead of editing
>> /etc/fstab is a pretty simple one though.  If you edit /etc/fstab and
>> don't set the default subvolume, you will need to mess around with the
>> bootloader configuration (and possibly rebuild the initramfs) to make
>> the system bootable again, whereas by setting the default subvolume, the
>> system will just boot as-is without needing any other configuration
>> changes.
> 
> That breaks as soon as you have nested subvolumes that are not
> explicitly mounted because they are lost in new snapshot.
> 
Unless they have been created manually, there won't be any such 
subvolumes on a Debian system.  Debian treats BTRFS no different from 
any other filesystem during the install, so you get no subvolumes 
whatsoever (in contrast to Fedora and SUSE treating BTRFS as a volume 
manager and not a filesystem, and thus having subvolumes all over the 
place in a default install).

Regardless of if you update /etc/fstab to point to the new subvolume or 
not, any old ones need to be either copied (the preferred method for 
stuff that isn't supposed to be equivalent to a separate filesystem), or 
have entries put in /etc/fstab.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-01 17:20         ` Austin S. Hemmelgarn
@ 2017-11-02  9:09           ` ST
  2017-11-02 11:01             ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: ST @ 2017-11-02  9:09 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

> >>>
> >>> Ok. I'll use more standard approaches. Which of following commands will
> >>> work with BTRFS:
> >>>
> >>> https://debian-handbook.info/browse/stable/sect.quotas.html
> >> None, qgroups are the only option right now with BTRFS, and it's pretty
> >> likely to stay that way since the internals of the filesystem don't fit
> >> well within the semantics of the regular VFS quota API.  However,
> >> provided you're not using huge numbers of reflinks and subvolumes, you
> >> should be fine using qgroups.
> > 
> > I want to have 7 daily (or 7+4) read-only snapshots per user, for ca.
> > 100 users. I don't expect users to invoke cp --reflink or take
> > snapshots.
> Based on what you say below about user access, you should be absolutely 
> fine then.
> 
> There's one other caveat though, only root can use the qgroup ioctls, 
> which means that only root can check quotas.

Only root can check quotas?! That is really strange. How users are
supposed to know they are about to be out of space?... Is this by design
so and will remain like that or it's just because this feature was not
finished yet?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-02  9:09           ` ST
@ 2017-11-02 11:01             ` Austin S. Hemmelgarn
  2017-11-02 15:59               ` ST
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-02 11:01 UTC (permalink / raw)
  To: ST; +Cc: linux-btrfs

On 2017-11-02 05:09, ST wrote:
>>>>>
>>>>> Ok. I'll use more standard approaches. Which of following commands will
>>>>> work with BTRFS:
>>>>>
>>>>> https://debian-handbook.info/browse/stable/sect.quotas.html
>>>> None, qgroups are the only option right now with BTRFS, and it's pretty
>>>> likely to stay that way since the internals of the filesystem don't fit
>>>> well within the semantics of the regular VFS quota API.  However,
>>>> provided you're not using huge numbers of reflinks and subvolumes, you
>>>> should be fine using qgroups.
>>>
>>> I want to have 7 daily (or 7+4) read-only snapshots per user, for ca.
>>> 100 users. I don't expect users to invoke cp --reflink or take
>>> snapshots.
>> Based on what you say below about user access, you should be absolutely
>> fine then.
>>
>> There's one other caveat though, only root can use the qgroup ioctls,
>> which means that only root can check quotas.
> 
> Only root can check quotas?! That is really strange. How users are
> supposed to know they are about to be out of space?... Is this by design
> so and will remain like that or it's just because this feature was not
> finished yet?
> 
I have no idea if it's intended to be that way, but quite a few things 
in BTRFS are root-only that debatably should not be.  I think the quota 
ioctls fall under the same category as the tree search ioctl, they 
access data that's technically privileged and can let you see things 
beyond the mount point they're run on.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-02 11:01             ` Austin S. Hemmelgarn
@ 2017-11-02 15:59               ` ST
       [not found]                 ` <E7316F3D-708C-4D5E-AB4B-F54B0B8471C1@rqc.ru>
  0 siblings, 1 reply; 20+ messages in thread
From: ST @ 2017-11-02 15:59 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

> >> There's one other caveat though, only root can use the qgroup ioctls,
> >> which means that only root can check quotas.
> > 
> > Only root can check quotas?! That is really strange. How users are
> > supposed to know they are about to be out of space?... Is this by design
> > so and will remain like that or it's just because this feature was not
> > finished yet?
> > 
> I have no idea if it's intended to be that way, but quite a few things 
> in BTRFS are root-only that debatably should not be.  I think the quota 
> ioctls fall under the same category as the tree search ioctl, they 
> access data that's technically privileged and can let you see things 
> beyond the mount point they're run on.

Could somebody among developers please elaborate on this issue - is
checking quota going always to be done by root? If so - btrfs might be a
no-go for our use case...

Thank you!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
       [not found]                 ` <E7316F3D-708C-4D5E-AB4B-F54B0B8471C1@rqc.ru>
@ 2017-11-02 16:28                   ` ST
  2017-11-02 17:13                     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: ST @ 2017-11-02 16:28 UTC (permalink / raw)
  To: Marat Khalili; +Cc: Austin S. Hemmelgarn, linux-btrfs

On Thu, 2017-11-02 at 19:16 +0300, Marat Khalili wrote:
> > Could somebody among developers please elaborate on this issue - is
> checking quota going always to be done by root? If so - btrfs might be
> a no-go for our use case...
> 
> Not a developer, but sysadmin here: what prevents you from either
> creating suid executable for this or configuring sudoers to let users
> call specific commands they need?

1. If designers have decided to limit access to that info only to root -
they must have their reasons to do so, and letting everybody do that is
probably contrary to those reasons.

2. I want to limit access to sftp, so there will be no custom commands
to execute...

3. sftp clients (especially those for windows) can determine quota - and
they do it probably in some standard way - which doesn't seem to be
compatible with btrfs...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-02 16:28                   ` ST
@ 2017-11-02 17:13                     ` Austin S. Hemmelgarn
  2017-11-02 17:32                       ` Andrei Borzenkov
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-02 17:13 UTC (permalink / raw)
  To: ST, Marat Khalili; +Cc: linux-btrfs

On 2017-11-02 12:28, ST wrote:
> On Thu, 2017-11-02 at 19:16 +0300, Marat Khalili wrote:
>>> Could somebody among developers please elaborate on this issue - is
>> checking quota going always to be done by root? If so - btrfs might be
>> a no-go for our use case...
>>
>> Not a developer, but sysadmin here: what prevents you from either
>> creating suid executable for this or configuring sudoers to let users
>> call specific commands they need?
> 
> 1. If designers have decided to limit access to that info only to root -
> they must have their reasons to do so, and letting everybody do that is
> probably contrary to those reasons.
I wouldn't say this is a compelling argument.  Some things that probably 
should be root only aren't, and others that should not be are, so the 
whole thing is rather haphazard.  Unless one of the developers can 
comment either way, I wouldn't worry too much about this.
> 
> 2. I want to limit access to sftp, so there will be no custom commands
> to execute...
A custom version of the 'quota' command would be easy to add in there. 
In fact, this is really the only option right now, since setting up sudo 
(or doas, or whatever other privilege escalation tool) to allow users to 
check usage requires full access to the 'btrfs' command, which in turn 
opens you up to people escaping their quotas.
> 
> 3. sftp clients (especially those for windows) can determine quota - and
> they do it probably in some standard way - which doesn't seem to be
> compatible with btrfs...
They call the 'quota' command.  This isn't integrated with BTRFS qgroups 
though because the VFS quota API (which it uses) has significantly 
different semantics than BTRFS quota groups.  VFS quotas are per-user 
(or on rare occasion, per 'project'), whereas BTRFS quota groups apply 
to subvolumes, not users, which is in turn part of why it's possible to 
escape quota requirements on BTRFS.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-11-02 17:13                     ` Austin S. Hemmelgarn
@ 2017-11-02 17:32                       ` Andrei Borzenkov
  0 siblings, 0 replies; 20+ messages in thread
From: Andrei Borzenkov @ 2017-11-02 17:32 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, ST, Marat Khalili; +Cc: linux-btrfs

02.11.2017 20:13, Austin S. Hemmelgarn пишет:
>>
>> 2. I want to limit access to sftp, so there will be no custom commands
>> to execute...
> A custom version of the 'quota' command would be easy to add in there.
> In fact, this is really the only option right now, since setting up sudo
> (or doas, or whatever other privilege escalation tool) to allow users to
> check usage requires full access to the 'btrfs' command, which in turn
> opens you up to people escaping their quotas.

It should be possible to allow only "btrfs qgroup show", at least in sudo.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Several questions regarding btrfs
  2017-10-31 16:29 ST
@ 2017-11-06 21:48 ` waxhead
  0 siblings, 0 replies; 20+ messages in thread
From: waxhead @ 2017-11-06 21:48 UTC (permalink / raw)
  To: ST, linux-btrfs

ST wrote:
> Hello,
>
> I've recently learned about btrfs and consider to utilize for my needs.
> I have several questions in this regard:
>
> I manage a dedicated server remotely and have some sort of script that
> installs an OS from several images. There I can define partitions and
> their FSs.
>
> 1. By default the script provides a small separate partition for /boot
> with ext3. Does it have any advantages or can I simply have /boot
> within / all on btrfs? (Note: the OS is Debian9)
>
I am on Debian as well and run /boot on multiple systems without any 
issues. Remember to run grub-install on all your disks and update-grub 
if you run it in a redundant setup. That way you can loose a disk and 
still be happy about it.
If you run a redundant setup like raid1 / raid10 make sure you have 
sufficient disks to avoid that the filesystem enters read-only mode. See 
the status page for details.

> 2. as for the / I get ca. following written to /etc/fstab:
> UUID=blah_blah /dev/sda3 / btrfs ...
> So top-level volume is populated after initial installation with the
> main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
> wiki I would like top-level volume to have only subvolumes (at least,
> the one mounted as /) and snapshots. I can make a snapshot of the
> top-level volume with / structure, but how can get rid of all the
> directories within top-lvl volume and keep only the subvolume
> containing / (and later snapshots), unmount it and then mount the
> snapshot that I took? rm -rf / - is not a good idea...
>
There are some tutorials floating around the web for this stuff. Just be 
careful, after a system update you might run into boot issues.
(I suggest you try playing with this in a VM first to see what happens)

> 3. in my current ext4-based setup I have two servers while one syncs
> files of certain dir to the other using lsyncd (which launches rsync on
> inotify events). As far as I have understood it is more efficient to use
> btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
> Do you think it would be possible to make lsyncd to use btrfs for
> syncing instead of rsync? I.e. can btrfs work with inotify events? Did
> somebody try it already?
> Otherwise I can sync using btrfs send/receive from within cron every
> 10-15 minutes, but it seems less elegant.
Have no idea, but since Debian uses systemd you might be able to cook up 
something with systemd.path 
(https://www.freedesktop.org/software/systemd/man/systemd.path.html

>
> 4. In a case when compression is used - what quota is based on - (a)
> amount of GBs the data actually consumes on the hard drive while in
> compressed state or (b) amount of GBs the data naturally is in
> uncompressed form. I need to set quotas as in (b). Is it possible? If
> not - should I file a feature request?
>
No, you should not file a feature request it seems.
Look what me and Google found for you :)
https://btrfs.wiki.kernel.org/index.php/Quota_support
(hint: read the "using limits" section)

> Thank you in advance!
No worries, good luck!
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Several questions regarding btrfs
@ 2017-10-31 16:29 ST
  2017-11-06 21:48 ` waxhead
  0 siblings, 1 reply; 20+ messages in thread
From: ST @ 2017-10-31 16:29 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've recently learned about btrfs and consider to utilize for my needs.
I have several questions in this regard:

I manage a dedicated server remotely and have some sort of script that
installs an OS from several images. There I can define partitions and
their FSs.

1. By default the script provides a small separate partition for /boot
with ext3. Does it have any advantages or can I simply have /boot
within / all on btrfs? (Note: the OS is Debian9)

2. as for the / I get ca. following written to /etc/fstab:
UUID=blah_blah /dev/sda3 / btrfs ...
So top-level volume is populated after initial installation with the
main filesystem dir-structure (/bin /usr /home, etc..). As per btrfs
wiki I would like top-level volume to have only subvolumes (at least,
the one mounted as /) and snapshots. I can make a snapshot of the
top-level volume with / structure, but how can get rid of all the
directories within top-lvl volume and keep only the subvolume
containing / (and later snapshots), unmount it and then mount the
snapshot that I took? rm -rf / - is not a good idea...

3. in my current ext4-based setup I have two servers while one syncs
files of certain dir to the other using lsyncd (which launches rsync on
inotify events). As far as I have understood it is more efficient to use
btrfs send/receive (over ssh) than rsync (over ssh) to sync two boxes.
Do you think it would be possible to make lsyncd to use btrfs for
syncing instead of rsync? I.e. can btrfs work with inotify events? Did
somebody try it already?
Otherwise I can sync using btrfs send/receive from within cron every
10-15 minutes, but it seems less elegant.

4. In a case when compression is used - what quota is based on - (a)
amount of GBs the data actually consumes on the hard drive while in
compressed state or (b) amount of GBs the data naturally is in
uncompressed form. I need to set quotas as in (b). Is it possible? If
not - should I file a feature request?

Thank you in advance!



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-11-06 21:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 16:23 Several questions regarding btrfs ST
2017-10-31 17:45 ` Austin S. Hemmelgarn
2017-10-31 18:51   ` Andrei Borzenkov
2017-10-31 19:07     ` Austin S. Hemmelgarn
2017-10-31 20:06   ` ST
2017-11-01 12:01     ` Austin S. Hemmelgarn
2017-11-01 14:05       ` ST
2017-11-01 15:31         ` Lukas Pirl
2017-11-01 17:20         ` Austin S. Hemmelgarn
2017-11-02  9:09           ` ST
2017-11-02 11:01             ` Austin S. Hemmelgarn
2017-11-02 15:59               ` ST
     [not found]                 ` <E7316F3D-708C-4D5E-AB4B-F54B0B8471C1@rqc.ru>
2017-11-02 16:28                   ` ST
2017-11-02 17:13                     ` Austin S. Hemmelgarn
2017-11-02 17:32                       ` Andrei Borzenkov
2017-11-01 17:52       ` Andrei Borzenkov
2017-11-01 18:28         ` Austin S. Hemmelgarn
2017-11-01 12:15     ` Duncan
2017-10-31 16:29 ST
2017-11-06 21:48 ` waxhead

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.