All of lore.kernel.org
 help / color / mirror / Atom feed
* git-repack & big files
@ 2011-01-11  7:37 Pietro Battiston
  2011-01-11 15:43 ` Phillip Susi
  0 siblings, 1 reply; 6+ messages in thread
From: Pietro Battiston @ 2011-01-11  7:37 UTC (permalink / raw)
  To: git

Hello,

first, I do know git is not optimized for big files, and that's fine.
But it is able, on my machine with 3 GB of RAM, to succesfully backup my
home directory¹, which contains, among others, several files of several
hundreds of megabytes each. And I like that a lot.

Since it perfectly does what it is not optimized to do... I then wonder
when it does not do what it declares: if I run git-repack² with the
parameter --window-memory set to, for instance, "100m", it takes
hundreds and hundreds of MB of memory until it runs out of memory, fails
a malloc and aborts.
So, two questions:

1) is there a bug, is the documentation about that parameter a bit too
optimistic or did I just not understand it?

2) do I have any hope that in one way or another my 500+ MB mailboxes
with relatively small changes over time are archived smartly (=diffs) by
git at the current state of development? If I understand correctly, the
project git-bigfiles³ would just "solve" my problems by not making
differences of big files.

thanks for the clarifications

Pietro


¹ Just for the records: through gibak:
http://eigenclass.org/hiki/gibak-0.3.0

² git version 1:1.7.2.3-2.2 on Debian

³ http://caca.zoy.org/wiki/git-bigfiles

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-repack & big files
  2011-01-11  7:37 git-repack & big files Pietro Battiston
@ 2011-01-11 15:43 ` Phillip Susi
  2011-01-11 19:03   ` Pietro Battiston
  0 siblings, 1 reply; 6+ messages in thread
From: Phillip Susi @ 2011-01-11 15:43 UTC (permalink / raw)
  To: Pietro Battiston; +Cc: git

On 1/11/2011 2:37 AM, Pietro Battiston wrote:
> Since it perfectly does what it is not optimized to do... I then wonder
> when it does not do what it declares: if I run git-repack² with the
> parameter --window-memory set to, for instance, "100m", it takes
> hundreds and hundreds of MB of memory until it runs out of memory, fails
> a malloc and aborts.
> So, two questions:

--window-memory reduces the window size to try and stay under the limit,
 but the window size can not be reduced below 1.

> 2) do I have any hope that in one way or another my 500+ MB mailboxes
> with relatively small changes over time are archived smartly (=diffs) by
> git at the current state of development? If I understand correctly, the
> project git-bigfiles³ would just "solve" my problems by not making
> differences of big files.

Git is not a backup tool.  You should use rsync rdiff-backup instead.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-repack & big files
  2011-01-11 15:43 ` Phillip Susi
@ 2011-01-11 19:03   ` Pietro Battiston
  2011-01-11 19:12     ` Stephen Bash
  2011-01-11 19:48     ` Phillip Susi
  0 siblings, 2 replies; 6+ messages in thread
From: Pietro Battiston @ 2011-01-11 19:03 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git

Il giorno mar, 11/01/2011 alle 10.43 -0500, Phillip Susi ha scritto:
> On 1/11/2011 2:37 AM, Pietro Battiston wrote:
> > Since it perfectly does what it is not optimized to do... I then wonder
> > when it does not do what it declares: if I run git-repack² with the
> > parameter --window-memory set to, for instance, "100m", it takes
> > hundreds and hundreds of MB of memory until it runs out of memory, fails
> > a malloc and aborts.
> > So, two questions:
> 
> --window-memory reduces the window size to try and stay under the limit,
>  but the window size can not be reduced below 1.

OK, I think I understood. Still, I don't think there are many doubts
that the documentation is misleading when it says "the window size will
dynamically scale down so as to not take up more than N bytes in
memory". That's all.

> > 2) do I have any hope that in one way or another my 500+ MB mailboxes
> > with relatively small changes over time are archived smartly (=diffs) by
> > git at the current state of development? If I understand correctly, the
> > project git-bigfiles³ would just "solve" my problems by not making
> > differences of big files.
> 
> Git is not a backup tool.  You should use rsync rdiff-backup instead.

That's unfortunate - I think I prefer to split my mailboxes than to
loose many of the nice features git provides. But thanks a lot for the
suggestion.

Pietro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-repack & big files
  2011-01-11 19:03   ` Pietro Battiston
@ 2011-01-11 19:12     ` Stephen Bash
  2011-01-11 19:48     ` Phillip Susi
  1 sibling, 0 replies; 6+ messages in thread
From: Stephen Bash @ 2011-01-11 19:12 UTC (permalink / raw)
  To: Pietro Battiston; +Cc: git, Phillip Susi

----- Original Message -----
> From: "Pietro Battiston" <me@pietrobattiston.it>
> To: "Phillip Susi" <psusi@cfl.rr.com>
> Sent: Tuesday, January 11, 2011 2:03:23 PM
> Subject: Re: git-repack & big files
>
> > > 2) do I have any hope that in one way or another my 500+ MB
> > > mailboxes
> > > with relatively small changes over time are archived smartly
> > > (=diffs) by
> > > git at the current state of development? If I understand
> > > correctly, the
> > > project git-bigfiles³ would just "solve" my problems by not making
> > > differences of big files.
> >
> > Git is not a backup tool. You should use rsync rdiff-backup instead.
> 
> That's unfortunate - I think I prefer to split my mailboxes than to
> loose many of the nice features git provides. But thanks a lot for the
> suggestion.

Off topic: Do you have the option to change to a solution that uses MailDir as the storage format rather than mbox?
   
   http://en.wikipedia.org/wiki/Maildir

Stephen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-repack & big files
  2011-01-11 19:03   ` Pietro Battiston
  2011-01-11 19:12     ` Stephen Bash
@ 2011-01-11 19:48     ` Phillip Susi
  2011-01-13  8:00       ` Pietro Battiston
  1 sibling, 1 reply; 6+ messages in thread
From: Phillip Susi @ 2011-01-11 19:48 UTC (permalink / raw)
  To: Pietro Battiston; +Cc: git

On 1/11/2011 2:03 PM, Pietro Battiston wrote:
> That's unfortunate - I think I prefer to split my mailboxes than to
> loose many of the nice features git provides. But thanks a lot for the
> suggestion.

I'm curious what features of git you find helpful for this purpose.  The
history log doesn't seem useful at all.  Generally mail is only added,
and sometimes deleted, never changed, so it also does not seem useful to
keep multiple revisions.  If you really want that though, rdiff-backup
will do that and keep the old revisions delta compressed.

I use Maildir instead of mbox and do a nightly incremental backup of the
whole system with dump, so any mail I might loose by accident I can pull
from the backup if it is older than a day, and I never delete mail.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git-repack & big files
  2011-01-11 19:48     ` Phillip Susi
@ 2011-01-13  8:00       ` Pietro Battiston
  0 siblings, 0 replies; 6+ messages in thread
From: Pietro Battiston @ 2011-01-13  8:00 UTC (permalink / raw)
  To: Phillip Susi; +Cc: git

Il giorno mar, 11/01/2011 alle 14.48 -0500, Phillip Susi ha scritto: 
> On 1/11/2011 2:03 PM, Pietro Battiston wrote:
> > That's unfortunate - I think I prefer to split my mailboxes than to
> > loose many of the nice features git provides. But thanks a lot for the
> > suggestion.
> 
> I'm curious what features of git you find helpful for this purpose.  

Many more, I guess, that the ones I'll be able to remember now. But for
instance some features that make it better than rdiff-backup are:

1) I like to see how a given file changed at a given point in time with
a comfortable interface - not just "restore this there", or search for
the right diff gzipped somewhere

2) I like to delete some given files/folders that I forgot to
(.git)ignore from all the backups with a single command

3) I love the fact that if I move/rename a file/folder, git notices it
(and doesn't think I just deleted some files and created some others).
Since I often move/rename files/folders, when I knew git I really though
"after years of waiting, finally backups will be smart".

4) in principle - though I admit I still never tried - I like the idea
that if I have two copies of the git repo, I can backup once on each
(think of one staying home and one following me when I travel) and then
rebase one on the other

5) to backup my home the first time, rdiff-backup takes slightly less
than 5 hours and uses 32 GB , git takes around 2 hours and uses 17 GB

6) in general, just having a powerful interface I'm used to

> The
> history log doesn't seem useful at all. 

I also like the fact that my commits have comments such as "before
changing PC", "after system upgrade", "before reordering mail"...


> Generally mail is only added,
> and sometimes deleted, never changed, so it also does not seem useful to
> keep multiple revisions.  

I'm not sure I get what you mean - mail is added and deleted, hence the
mailbox is changed, hence I find it useful to keep multiple revisions.


> If you really want that though, rdiff-backup
> will do that and keep the old revisions delta compressed.
> 

Yes, I think I will live with rdiff-backup. And miss git, but I
perfectly understand that git simply doesn't aim at solving my problem,
and that's fair.


> I use Maildir instead of mbox and do a nightly incremental backup of the
> whole system with dump, so any mail I might loose by accident I can pull
> from the backup if it is older than a day, and I never delete mail.

Yep, Maildir is nice from this point of view. But nope, it is not
practical for me to change now.

thanks to all for suggestions (and sorry for the OT)

Pietro

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-01-13  8:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-11  7:37 git-repack & big files Pietro Battiston
2011-01-11 15:43 ` Phillip Susi
2011-01-11 19:03   ` Pietro Battiston
2011-01-11 19:12     ` Stephen Bash
2011-01-11 19:48     ` Phillip Susi
2011-01-13  8:00       ` Pietro Battiston

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.