All of lore.kernel.org
 help / color / mirror / Atom feed
* backing up a file server with many subvolumes
@ 2017-03-26  3:00 J. Hart
  2017-03-26  9:14 ` Roman Mamedov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: J. Hart @ 2017-03-26  3:00 UTC (permalink / raw)
  To: linux-btrfs

I have a Btrfs filesystem on a backup server.  This filesystem has a 
directory to hold backups for filesystems from remote machines.  In this 
directory is a subdirectory for each machine.  Under each machine 
subdirectory is one directory for each filesystem (ex /boot, /home, etc) 
on that machine.  In each filesystem subdirectory are incremental 
snapshot subvolumes for that filesystem.  The scheme is something like 
this:

<top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>

I'd like to try to back up (duplicate) the file server filesystem 
containing these snapshot subvolumes for each remote machine.  The 
problem is that I don't think I can use send/receive to do this. "Btrfs 
send" requires "read-only" snapshots, and snapshots are not recursive as 
yet.  I think there are too many subvolumes which change too often to 
make doing this without recursion practical.

Any thoughts would be most appreciated.

J. Hart


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-26  3:00 backing up a file server with many subvolumes J. Hart
@ 2017-03-26  9:14 ` Roman Mamedov
  2017-03-26 19:51   ` Adam Borowski
  2017-03-26 20:24 ` Peter Grandi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Roman Mamedov @ 2017-03-26  9:14 UTC (permalink / raw)
  To: J. Hart; +Cc: linux-btrfs

On Sat, 25 Mar 2017 23:00:20 -0400
"J. Hart" <jfhart085@gmail.com> wrote:

> I have a Btrfs filesystem on a backup server.  This filesystem has a 
> directory to hold backups for filesystems from remote machines.  In this 
> directory is a subdirectory for each machine.  Under each machine 
> subdirectory is one directory for each filesystem (ex /boot, /home, etc) 
> on that machine.  In each filesystem subdirectory are incremental 
> snapshot subvolumes for that filesystem.  The scheme is something like 
> this:
> 
> <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>
> 
> I'd like to try to back up (duplicate) the file server filesystem 
> containing these snapshot subvolumes for each remote machine.  The 
> problem is that I don't think I can use send/receive to do this. "Btrfs 
> send" requires "read-only" snapshots, and snapshots are not recursive as 
> yet.  I think there are too many subvolumes which change too often to 
> make doing this without recursion practical.

You could have done time-based snapshots on the top level (for /backup/), say,
every 6 hours, and keep those for e.g. a month. Then don't bother with any
other kind of subvolumes/snapshots on the backup machine, and do backups from
remote machines into their respective subdirectories using simple 'rsync'.

That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced
exercise in futility that you have (there are subvolumes? must use them for
everything, even the frigging /boot/; there is send/receive? absolutely must
use it for backing up; etc.)

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-26  9:14 ` Roman Mamedov
@ 2017-03-26 19:51   ` Adam Borowski
  0 siblings, 0 replies; 10+ messages in thread
From: Adam Borowski @ 2017-03-26 19:51 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: J. Hart, linux-btrfs

On Sun, Mar 26, 2017 at 02:14:36PM +0500, Roman Mamedov wrote:
> You could have done time-based snapshots on the top level (for /backup/), say,
> every 6 hours, and keep those for e.g. a month. Then don't bother with any
> other kind of subvolumes/snapshots on the backup machine, and do backups from
> remote machines into their respective subdirectories using simple 'rsync'.
> 
> That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced
> exercise in futility that you have (there are subvolumes? must use them for
> everything, even the frigging /boot/; there is send/receive? absolutely must
> use it for backing up; etc.)

Using old boring rsync is actually a pretty good idea, with caveats.

I for one don't herd server farms, thus systems I manage tend to be special
snowflakes.  Some run modern btrfs, some are on ancient kernels, usually /
is on a mdraid with a traditional filesystem, I got a bunch of ARM SoCs at
home -- plus even an ARM hosted server at Scaleway.  Standardizing on rsync
lets me make all those snowflakes backup the same way.  Only on the
destination I make full use of btrfs features.

Another benefit of rsync is that I don't exactly trust that send from 3.13
to receive on 4.9 won't have a data loss bug, while rsync is extremely well
tested.

On the other hand, rsync is _slow_.  Mere stat() calls on a non-trivial
piece of spinning rust can take half on hour.  That's something that's fine
in a nightly, but what if you want to back important stuff every 3 hours? 
Especially if those are, say, Maildir mails -- many many files to stat,
almost all of them cold.  Here send/receive shines.

And did I say that's important stuff?  So you send/receive to one target
every 3 hours, and rsync nightly to another.

-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-26  3:00 backing up a file server with many subvolumes J. Hart
  2017-03-26  9:14 ` Roman Mamedov
@ 2017-03-26 20:24 ` Peter Grandi
  2017-03-27  5:57 ` Marat Khalili
  2017-03-27 11:53 ` Austin S. Hemmelgarn
  3 siblings, 0 replies; 10+ messages in thread
From: Peter Grandi @ 2017-03-26 20:24 UTC (permalink / raw)
  To: Linux fs Btrfs

> [ ... ] In each filesystem subdirectory are incremental
> snapshot subvolumes for that filesystem.  [ ... ] The scheme
> is something like this:

> <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>

BTW hopefully this does not amounts to too many subvolumes in
the '.../backup/' volume, because that can create complications,
where "too many" IIRC is more than a few dozen (even if a low
number of hundreds is still doable).

> I'd like to try to back up (duplicate) the file server
> filesystem containing these snapshot subvolumes for each
> remote machine. The problem is that I don't think I can use
> send/receive to do this. "Btrfs send" requires "read-only"
> snapshots, and snapshots are not recursive as yet.

Why is that a problem? What is a recursive snapshot?

> I think there are too many subvolumes which change too often
> to make doing this without recursion practical.

It is not clear to me how the «incremental snapshot subvolumes
for that filesystem» are made, whether with RSYNC or 'send' and
'receive' itself. It is also not clear to me why those snapshots
«change too often», why would they change at all? Once a backup
is made in whichever way to an «incremental snapshot», why would
that «incremental snapshot» ever change but for being deleted?

There are some tools that rely on the specific abilities of
'send' with options '-p' and '-c' to save a lot of network
bandwidth and target storage space, perhaps you might be
interested in searching for them.

Anyhow I'll repeat here part of an answer to a similar message:
issues like yours usually are based on incomplete understanding
of 'send' and 'receive', and on IRC user "darkling" explained it
fairly well:

> When you use -c, you're telling the FS that it can expect to
> find a sent copy of that subvol on the receiving side, and
> that anything shared with it can be sent by reference. OK, so
> with -c on its own, you're telling the FS that "all the data
> in this subvol already exists on the remote".

> So, when you send your subvol, *all* of the subvol's metadata
> is sent, and where that metadata refers to an extent that's
> shared with the -c subvol, the extent data isn't sent, because
> it's known to be on the other end already, and can be shared
> directly from there.

> OK. So, with -p, there's a "base" subvol. The send subvol and
> the -p reference subvol are both snapshots of that base (at
> different times). The -p reference subvol, as with -c, is
> assumed to be on the remote FS. However, because it's known to
> be an earlier version of the same data, you can be more
> efficient in the sending by saying "start from the earlier
> version, and modify it in this way to get the new version"

> So, with -p, not all of the metadata is sent, because you know
> you've already got most of it on the remote in the form of the
> earlier version.

> So -p is "take this thing and apply these differences to it"
> and -c is "build this thing from scratch, but you can share
> some of the data with these sources"

Also here some additional details:

  http://logs.tvrrug.org.uk/logs/%23btrfs/2016-06-29.html#2016-06-29T22:39:59

The requirement for read-only is because in that way it is
pretty sure that the same stuff is on both origin and target
volume.

It may help to compare with RSYNC: it has to scan both the full
origin and target trees, because it cannot be told that there is
a parent tree that is the same on origin and target; but with
option '--link-dest' it can do something similar to 'send -c'.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-26  3:00 backing up a file server with many subvolumes J. Hart
  2017-03-26  9:14 ` Roman Mamedov
  2017-03-26 20:24 ` Peter Grandi
@ 2017-03-27  5:57 ` Marat Khalili
  2017-03-27 12:00   ` J. Hart
  2017-04-01  8:24   ` Kai Krakow
  2017-03-27 11:53 ` Austin S. Hemmelgarn
  3 siblings, 2 replies; 10+ messages in thread
From: Marat Khalili @ 2017-03-27  5:57 UTC (permalink / raw)
  To: linux-btrfs

Just some consideration, since I've faced similar but no exactly same 
problem: use rsync, but create snapshots on target machine. Blind rsync 
will destroy deduplication of your snapshots and take huge amount of 
storage, so it's not a solution. But you can rsync --inline your 
snapshots in chronological order to some folder and re-take snapshots of 
that folder, thus recreating your snapshots structure on target. 
Obviously, it can/should be automated.


--

With Best Regards,
Marat Khalili

On 26/03/17 06:00, J. Hart wrote:
> I have a Btrfs filesystem on a backup server.  This filesystem has a 
> directory to hold backups for filesystems from remote machines. In 
> this directory is a subdirectory for each machine.  Under each machine 
> subdirectory is one directory for each filesystem (ex /boot, /home, 
> etc) on that machine.  In each filesystem subdirectory are incremental 
> snapshot subvolumes for that filesystem.  The scheme is something like 
> this:
>
> <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>
>
> I'd like to try to back up (duplicate) the file server filesystem 
> containing these snapshot subvolumes for each remote machine.  The 
> problem is that I don't think I can use send/receive to do this. 
> "Btrfs send" requires "read-only" snapshots, and snapshots are not 
> recursive as yet.  I think there are too many subvolumes which change 
> too often to make doing this without recursion practical.
>
> Any thoughts would be most appreciated.
>
> J. Hart
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-26  3:00 backing up a file server with many subvolumes J. Hart
                   ` (2 preceding siblings ...)
  2017-03-27  5:57 ` Marat Khalili
@ 2017-03-27 11:53 ` Austin S. Hemmelgarn
  2017-04-01  8:21   ` Kai Krakow
  3 siblings, 1 reply; 10+ messages in thread
From: Austin S. Hemmelgarn @ 2017-03-27 11:53 UTC (permalink / raw)
  To: jfhart085, linux-btrfs

On 2017-03-25 23:00, J. Hart wrote:
> I have a Btrfs filesystem on a backup server.  This filesystem has a
> directory to hold backups for filesystems from remote machines.  In this
> directory is a subdirectory for each machine.  Under each machine
> subdirectory is one directory for each filesystem (ex /boot, /home, etc)
> on that machine.  In each filesystem subdirectory are incremental
> snapshot subvolumes for that filesystem.  The scheme is something like
> this:
>
> <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>
>
> I'd like to try to back up (duplicate) the file server filesystem
> containing these snapshot subvolumes for each remote machine.  The
> problem is that I don't think I can use send/receive to do this. "Btrfs
> send" requires "read-only" snapshots, and snapshots are not recursive as
> yet.  I think there are too many subvolumes which change too often to
> make doing this without recursion practical.
>
> Any thoughts would be most appreciated.
In general, I would tend to agree with everyone else so far if you have 
to keep your current setup.  Use rsync with the --inplace option to send 
data to a staging location, then snapshot that staging location to do 
the actual backup.

Now, that said, I could probably give some more specific advice if I had 
a bit more info on how you're actually storing the backups.  There are 
three general ways you can do this with BTRFS and subvolumes:
1. Send/receive of snapshots from the system being backed up.
2. Use some other software to transfer the data into a staging location 
on the backup server, then snapshot that.
3. Use some other software to transfer the data, and have it handle 
snapshots instead of using BTRFS, possibly having it create subvolumes 
instead of directories at the top level for each system.

Of the three, I would generally recommend method 2, as it doesn't 
require the remote system to be using BTRFS, and generally scales pretty 
well, and it also amounts to essentially what people are recommending 
you do to backup your backup server.

On the note of needing read-only snapshots, in both cases 1 and 2, your 
snapshots should be read-only on the server (method 1 mandates it, 
method 2 makes it easy).  In case 3, the snapshots should ideally be 
getting marked read-only some other way.  Having backups be writable is 
a bad idea, it leads to too many opportunities for software to screw 
things up, and makes it impossible to tell if you just accidentally 
messed things up, or something went wrong in your backup system or hardware.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-27  5:57 ` Marat Khalili
@ 2017-03-27 12:00   ` J. Hart
  2017-03-27 13:05     ` Graham Cobb
  2017-04-01  8:24   ` Kai Krakow
  1 sibling, 1 reply; 10+ messages in thread
From: J. Hart @ 2017-03-27 12:00 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs

That is a very interesting idea.  I'll try some experiments with this.

Many Thanks for the assistance....:-)

J. Hart


On 03/27/2017 01:57 AM, Marat Khalili wrote:
> Just some consideration, since I've faced similar but no exactly same 
> problem: use rsync, but create snapshots on target machine. Blind 
> rsync will destroy deduplication of your snapshots and take huge 
> amount of storage, so it's not a solution. But you can rsync --inline 
> your snapshots in chronological order to some folder and re-take 
> snapshots of that folder, thus recreating your snapshots structure on 
> target. Obviously, it can/should be automated.
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-27 12:00   ` J. Hart
@ 2017-03-27 13:05     ` Graham Cobb
  0 siblings, 0 replies; 10+ messages in thread
From: Graham Cobb @ 2017-03-27 13:05 UTC (permalink / raw)
  To: jfhart085; +Cc: linux-btrfs

On 27/03/17 13:00, J. Hart wrote:
> That is a very interesting idea.  I'll try some experiments with this.

You might want to look into two tools which I have found useful for
similar backups:

1) rsnapshot -- this uses rsync for backing up multiple systems and has
been stable for quite a long time. If the target disk is btrfs it is
fairly easy to configure so that it uses btrfs snapshots to create and
remove the snapshot directories, speeding up the process. This doesn't
really use any complex btrfs features and has been stable for me even on
my Debian stable (kernel 3.16.39) system.

2) btrbk -- this allows you to create and manage btrfs snapshots on the
source disk as well as backup snapshots on a separate btrfs disk. You
can separately control how many snapshots you keep online on both the
source and the backup disk. This is particularly useful for cases where
you want to take very frequent snapshots (say hourly) for which rsync
may be too slow (and rsync does not take a consistent snapshot, of course).

There are many other tools, of course (I also take daily backups with
dar to an ext4 system, without using any btrfs features at all, just in
case a new version of btrfs suddenly decided to correct all copies of
IHATEBTRFS on the disk to ILOVEBTRFS, for example :-) ).

Graham

Note to self: re-read this message periodically to check that feature
hasn't appeared yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-27 11:53 ` Austin S. Hemmelgarn
@ 2017-04-01  8:21   ` Kai Krakow
  0 siblings, 0 replies; 10+ messages in thread
From: Kai Krakow @ 2017-04-01  8:21 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 27 Mar 2017 07:53:17 -0400
schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> > I'd like to try to back up (duplicate) the file server filesystem
> > containing these snapshot subvolumes for each remote machine.  The
> > problem is that I don't think I can use send/receive to do this.
> > "Btrfs send" requires "read-only" snapshots, and snapshots are not
> > recursive as yet.  I think there are too many subvolumes which
> > change too often to make doing this without recursion practical.
> >
> > Any thoughts would be most appreciated.  
> In general, I would tend to agree with everyone else so far if you
> have to keep your current setup.  Use rsync with the --inplace option
> to send data to a staging location, then snapshot that staging
> location to do the actual backup.
> 
> Now, that said, I could probably give some more specific advice if I
> had a bit more info on how you're actually storing the backups.
> There are three general ways you can do this with BTRFS and
> subvolumes: 1. Send/receive of snapshots from the system being backed
> up. 2. Use some other software to transfer the data into a staging
> location on the backup server, then snapshot that.
> 3. Use some other software to transfer the data, and have it handle 
> snapshots instead of using BTRFS, possibly having it create
> subvolumes instead of directories at the top level for each system.

If you decide for (3), I can recommend borgbackup. It allows variable
block size deduplication across all backup sources, tho to fully get
that potential, your backups can only be done serially not in parallel.
Borgbackup cannot access the same repository with two processes in
parallel, and deduplication is only per repository.

Another recommendation for backups is the 3-2-1 rule:

  * have at least 3 different copies of your data (that means, your
    original data, the backup copy, and another backup copy, separated
    in a way they cannot fail for the same reason)
  * use at least 2 different media (that also means: don't backup
    btrfs to btrfs, and/or use 2 different backup techniques)
  * keep at least 1 external copy (maybe rsync to a remote location)

The 3 copy rule can be deployed by using different physical locations,
different device types, different media, and/or different backup
programs. So it's kind of entangled with the 2 and 1 rule.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: backing up a file server with many subvolumes
  2017-03-27  5:57 ` Marat Khalili
  2017-03-27 12:00   ` J. Hart
@ 2017-04-01  8:24   ` Kai Krakow
  1 sibling, 0 replies; 10+ messages in thread
From: Kai Krakow @ 2017-04-01  8:24 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 27 Mar 2017 08:57:17 +0300
schrieb Marat Khalili <mkh@rqc.ru>:

> Just some consideration, since I've faced similar but no exactly same 
> problem: use rsync, but create snapshots on target machine. Blind
> rsync will destroy deduplication of your snapshots and take huge
> amount of storage, so it's not a solution. But you can rsync --inline
> your snapshots in chronological order to some folder and re-take
> snapshots of that folder, thus recreating your snapshots structure on
> target. Obviously, it can/should be automated.

I think it's --inplace and --no-whole-file...

Apparently, rsync cannot detect moved files which was a big deal for me
regarding deduplication, so I found another solution which is even
faster. See my other reply.

> On 26/03/17 06:00, J. Hart wrote:
> > I have a Btrfs filesystem on a backup server.  This filesystem has
> > a directory to hold backups for filesystems from remote machines.
> > In this directory is a subdirectory for each machine.  Under each
> > machine subdirectory is one directory for each filesystem
> > (ex /boot, /home, etc) on that machine.  In each filesystem
> > subdirectory are incremental snapshot subvolumes for that
> > filesystem.  The scheme is something like this:
> >
> > <top>/backup/<machine>/<filesystem>/<many snapshot subvolumes>
> >
> > I'd like to try to back up (duplicate) the file server filesystem 
> > containing these snapshot subvolumes for each remote machine.  The 
> > problem is that I don't think I can use send/receive to do this. 
> > "Btrfs send" requires "read-only" snapshots, and snapshots are not 
> > recursive as yet.  I think there are too many subvolumes which
> > change too often to make doing this without recursion practical.
> >
> > Any thoughts would be most appreciated.
> >
> > J. Hart
> >
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-btrfs" in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-04-01  8:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-26  3:00 backing up a file server with many subvolumes J. Hart
2017-03-26  9:14 ` Roman Mamedov
2017-03-26 19:51   ` Adam Borowski
2017-03-26 20:24 ` Peter Grandi
2017-03-27  5:57 ` Marat Khalili
2017-03-27 12:00   ` J. Hart
2017-03-27 13:05     ` Graham Cobb
2017-04-01  8:24   ` Kai Krakow
2017-03-27 11:53 ` Austin S. Hemmelgarn
2017-04-01  8:21   ` Kai Krakow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.