All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel de Perthuis <g2p.code@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Possible to dedpulicate read-only snapshots for space-efficient backups
Date: Sun, 5 May 2013 12:55:54 +0000 (UTC)	[thread overview]
Message-ID: <km5ksq$p15$1@ger.gmane.org> (raw)
In-Reply-To: mjnh5a-mcf.ln1@hurikhan.ath.cx

On Sun, 05 May 2013 12:07:17 +0200, Kai Krakow wrote:
> Hey list,
> 
> I wonder if it is possible to deduplicate read-only snapshots.
> 
> Background:
> 
> I'm using an bash/rsync script[1] to backup my whole system on a nightly 
> basis to an attached USB3 drive into a scratch area, then take a snapshot of 
> this area. I'd like to have these snapshots immutable, so they should be 
> read-only.
> 
> Since rsync won't discover moved files but instead place a new copy of that 
> in the backup, I'm running the wonderful bedup application[2] to deduplicate 
> my backup drive from time to time and it almost always gains back a good 
> pile of gigabytes. The rest of storage space issues is taken care of by 
> using rsync's inplace option (although this won't cover the case of files 
> moved and changed between backup runs) and using compress-force=gzip.

> I've read about ongoing work to integrate offline (and even online) 
> deduplication into the kernel so that this process can be made atomic (and 
> even block-based instead of file-based). This would - to my understandings - 
> result in the immutable attribute no longer needed. So, given the fact above 
> and for the case read-only snapshots cannot be used for this application 
> currently, will these patches address the problem and read-only snapshots 
> could be deduplicated? Or are read-only snapshots meant to be what the name 
> suggests: Immutable, even for deduplication?

There's no deep reason read-only snapshots should keep their storage
immutable, they can be affected by raid rebalancing for example.

The current bedup restriction comes from the clone call; Mark Fasheh's
dedup ioctl[3] appears to be fine with snapshots.  The bedup integration
(in a branch) is a work in progress at the moment.  I need to fix a scan
bug, tweak parameters for the latest kernel dedup patch, remove a lot of
logic that is now unnecessary, and figure out the compatibility story.

> Regards,
> Kai
> 
> [1]: https://gist.github.com/kakra/5520370
> [2]: https://github.com/g2p/bedup

[3]: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25062



  reply	other threads:[~2013-05-05 12:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-05 10:07 Possible to dedpulicate read-only snapshots for space-efficient backups Kai Krakow
2013-05-05 12:55 ` Gabriel de Perthuis [this message]
2013-05-05 17:22   ` Kai Krakow
2013-05-07 22:07     ` Gabriel de Perthuis
2013-05-07 23:04       ` Kai Krakow
2013-05-07 23:22         ` Kai Krakow
2013-05-07 23:35         ` Possible to deduplicate " Gabriel de Perthuis
2013-05-06  6:15 ` Possible to dedpulicate " Jan Schmidt
2013-05-06  7:44   ` Kai Krakow
2013-05-06 14:35     ` james northrup
2013-05-06 20:48       ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='km5ksq$p15$1@ger.gmane.org' \
    --to=g2p.code@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.