All of lore.kernel.org
 help / color / mirror / Atom feed
* Git as a backup system?
@ 2010-11-08 18:01 Eric Frederich
  2010-11-08 18:04 ` Jonathan Nieder
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Frederich @ 2010-11-08 18:01 UTC (permalink / raw)
  To: git

I maintain a corporate MediaWiki installation.
Currently I have a cron job that runs daily and tar's up the contents
of the installation directory and runs a mysqldump.
I keep backups of the past 45 days.
Each backup is about 200M, so all in all I always have about 9.0G of backups.
Most of the changes are in the database, so the mysqldump file is
changed every day.
Other than that, there can be new files uploaded but they never
change, just get added.
All configuration files stay the same.

I wrote a script that untar'd the contents each backup, gunziped the
mysql dump, and made a git commit.
The resulting .git directory wound up being 837M, but after running a
long (8 minute) "git gc" command, it went down to 204M.

== Questions ==
What mysqldump options would be good to use for storage in git?
Right now I'm not passing any parameters to mysqldump and its doing
all inserts for each table on a single huge line.
Would git handle it better if each insert was on its own line?

Lets say that the repo gets too big and I want to throw away history.
I'd have a linear history with a single commit every day.
Is there a way to take just the last 30 commits and throw away everything else?

Am I insane?  Are there other tools more suited toward this?
I just thought of using Git since I looked at my 9G worth of data out
there in my backup directory that is almost exactly the same and said
"git could handle this well".

Are any of you using git for a backup system?  Have any tips, words of wisdom?

Thanks,
~Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as a backup system?
  2010-11-08 18:01 Git as a backup system? Eric Frederich
@ 2010-11-08 18:04 ` Jonathan Nieder
  2010-11-08 18:26 ` Konstantin Khomoutov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2010-11-08 18:04 UTC (permalink / raw)
  To: Eric Frederich; +Cc: git

Eric Frederich wrote:

> Am I insane?  Are there other tools more suited toward this?
> I just thought of using Git since I looked at my 9G worth of data out
> there in my backup directory that is almost exactly the same and said
> "git could handle this well".

You might like bup: https://github.com/apenwarr/bup#readme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as a backup system?
  2010-11-08 18:01 Git as a backup system? Eric Frederich
  2010-11-08 18:04 ` Jonathan Nieder
@ 2010-11-08 18:26 ` Konstantin Khomoutov
  2010-11-08 20:06 ` Dirk Süsserott
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Konstantin Khomoutov @ 2010-11-08 18:26 UTC (permalink / raw)
  To: Eric Frederich; +Cc: git

On Mon, 8 Nov 2010 13:01:29 -0500
Eric Frederich <eric.frederich@gmail.com> wrote:

> I maintain a corporate MediaWiki installation.
> Currently I have a cron job that runs daily and tar's up the contents
> of the installation directory and runs a mysqldump.
> I keep backups of the past 45 days.
> Each backup is about 200M, so all in all I always have about 9.0G of
> backups. Most of the changes are in the database, so the mysqldump
> file is changed every day.
> Other than that, there can be new files uploaded but they never
> change, just get added.
> All configuration files stay the same.
[...]
> Am I insane?  Are there other tools more suited toward this?
> I just thought of using Git since I looked at my 9G worth of data out
> there in my backup directory that is almost exactly the same and said
> "git could handle this well".
> 
> Are any of you using git for a backup system?  Have any tips, words
> of wisdom?
I suspect the rdiff-backup tool [1] was invented precisely for the setup
like yours: it is able to sync one directory to another + create diffs
between them thus providing for incremental backups.

1. http://www.nongnu.org/rdiff-backup/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as a backup system?
  2010-11-08 18:01 Git as a backup system? Eric Frederich
  2010-11-08 18:04 ` Jonathan Nieder
  2010-11-08 18:26 ` Konstantin Khomoutov
@ 2010-11-08 20:06 ` Dirk Süsserott
  2010-11-08 20:25 ` Ævar Arnfjörð Bjarmason
  2010-11-09  0:17 ` Patrick Rouleau
  4 siblings, 0 replies; 6+ messages in thread
From: Dirk Süsserott @ 2010-11-08 20:06 UTC (permalink / raw)
  To: Eric Frederich; +Cc: git

Am 08.11.2010 19:01 schrieb Eric Frederich:
> I maintain a corporate MediaWiki installation.
> Currently I have a cron job that runs daily and tar's up the contents
> of the installation directory and runs a mysqldump.
>
> I wrote a script that untar'd the contents each backup, gunziped the
> mysql dump, and made a git commit.
> The resulting .git directory wound up being 837M, but after running a
> long (8 minute) "git gc" command, it went down to 204M.
>
> == Questions ==
> What mysqldump options would be good to use for storage in git?
> Right now I'm not passing any parameters to mysqldump and its doing
> all inserts for each table on a single huge line.
> Would git handle it better if each insert was on its own line?
>
> Are any of you using git for a backup system?  Have any tips, words of wisdom?
>
> Thanks,
> ~Eric

Hi Eric,

I also use mysqldump and Git to make backups of my databases. Indeed, it 
performs much better when each change (insert statement) is on a 
separate line. mysqldump has an option for that which I don't recommend, 
because it dramatically slows down the dump and the restore. It would 
then create separate "insert into ..." statement for each changed line.

For me the attached script worked very well: I pipe the output of 
mysqldump through the script and it simply inserts a linefeed after each 
record.

----------------------------------
#!/usr/bin/perl -p

use strict;
use warnings;

# Before:
# INSERT INTO `schliess_grund` VALUES 
(1,'Explizit'),(2,'Neuanmeldung'),(4,'Sperrung'),(3,'Timeout');
#
# After:
# INSERT INTO `schliess_grund` VALUES
#    (1,'Explizit'),
#    (2,'Neuanmeldung'),
#    (4,'Sperrung'),
#    (3,'Timeout');
if (/^(INSERT INTO .*? VALUES) (.*);$/)
{
     $_ = "$1\n     $2\n    ;\n";
     s/\),\(/\)\n    ,\(/g;
}
----------------------------------

The changeset will be much smaller. Let's call the script "wrap.pl". 
Then run the following:

----------------------------------
mysqldump --opt --routines [...] -r <outfile.tmp> <dbname>
./wrap.pl <outfile.tmp> > <outfile>; rm <outfile.tmp>
git add <outfile>
if ! git diff-index --quiet HEAD --; then
     git commit -m "Backup of ..."
fi
----------------------------------

Try it out!

Cheers,
     Dirk

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as a backup system?
  2010-11-08 18:01 Git as a backup system? Eric Frederich
                   ` (2 preceding siblings ...)
  2010-11-08 20:06 ` Dirk Süsserott
@ 2010-11-08 20:25 ` Ævar Arnfjörð Bjarmason
  2010-11-09  0:17 ` Patrick Rouleau
  4 siblings, 0 replies; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-11-08 20:25 UTC (permalink / raw)
  To: Eric Frederich; +Cc: git

On Mon, Nov 8, 2010 at 19:01, Eric Frederich <eric.frederich@gmail.com> wrote:
> I maintain a corporate MediaWiki installation.
> Currently I have a cron job that runs daily and tar's up the contents
> of the installation directory and runs a mysqldump.
> I keep backups of the past 45 days.
> Each backup is about 200M, so all in all I always have about 9.0G of backups.
> Most of the changes are in the database, so the mysqldump file is
> changed every day.
> Other than that, there can be new files uploaded but they never
> change, just get added.
> All configuration files stay the same.
>
> I wrote a script that untar'd the contents each backup, gunziped the
> mysql dump, and made a git commit.
> The resulting .git directory wound up being 837M, but after running a
> long (8 minute) "git gc" command, it went down to 204M.
>
> == Questions ==
> What mysqldump options would be good to use for storage in git?
> Right now I'm not passing any parameters to mysqldump and its doing
> all inserts for each table on a single huge line.
> Would git handle it better if each insert was on its own line?

I use git to back up all my data. It works great. But how big the
dumps get depends very much on the database.

Here's a graph of the size of my mysql backup directories:
http://munin.nix.is/nix.is/v.nix.is/dirs_var_backup_mysql.html

I use this little wrapper script:
https://github.com/avar/linode-etc/blob/master/bin/cron/mysqldump-to-git-all

Which calls this:
https://github.com/avar/linode-etc/blob/master/bin/cron/sqldump-to-git

I find with MySQL --skip-extended-insert and --compact work really well.

Then I use this to repack + gc the repos:
https://github.com/avar/linode-etc/blob/master/bin/cron/git-repack-and-gc-dir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git as a backup system?
  2010-11-08 18:01 Git as a backup system? Eric Frederich
                   ` (3 preceding siblings ...)
  2010-11-08 20:25 ` Ævar Arnfjörð Bjarmason
@ 2010-11-09  0:17 ` Patrick Rouleau
  4 siblings, 0 replies; 6+ messages in thread
From: Patrick Rouleau @ 2010-11-09  0:17 UTC (permalink / raw)
  To: git

Eric Frederich <eric.frederich <at> gmail.com> writes:

> Are any of you using git for a backup system?  Have any tips, words of wisdom?

I had the same idea to backup a MySQL database with git.

To be able to easily drop old "backups", I have choose to go with monthly 
branches: each month, the previous month's branch is checked out and a new 
branch is created; each year, the previous year is checked out...

This is running only for 5 weeks, but so far it works pretty well.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-11-09  0:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-08 18:01 Git as a backup system? Eric Frederich
2010-11-08 18:04 ` Jonathan Nieder
2010-11-08 18:26 ` Konstantin Khomoutov
2010-11-08 20:06 ` Dirk Süsserott
2010-11-08 20:25 ` Ævar Arnfjörð Bjarmason
2010-11-09  0:17 ` Patrick Rouleau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.