All of lore.kernel.org
 help / color / mirror / Atom feed
* Cutting history
@ 2010-07-10  3:25 Enrico Weigelt
  2010-07-10  4:08 ` Joshua Jensen
  0 siblings, 1 reply; 7+ messages in thread
From: Enrico Weigelt @ 2010-07-10  3:25 UTC (permalink / raw)
  To: git


Hi folks,


I'm using git for automatic backups (eg. database dumps). This 
works quite well, but as time goes, the history (and so the repo)
gets larger and larger. It would be really nice to allow cutting
off old stuff (eg. after N commits in the past). 

Maybe that could be done by introducing "stopper" tags: commits
that have an stopper-tag may have missing parents, and git-gc
can be told to ignore those parents and throw away everything
behind the stopper (if not referenced otherwise).

A probably cleaner, but more invasive way could be making refs
to vectors, which may contain stop points (multiple ones in case
of merges) additionally to the start point. Remote transmits only
contain the commits within this range, and GC also just scans
the range (instead of following all parents).


What do you think about this ?


cu
-- 
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weigelt@metux.de
 mobile: +49 151 27565287  icq:   210169427         skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10  3:25 Cutting history Enrico Weigelt
@ 2010-07-10  4:08 ` Joshua Jensen
  2010-07-10  6:43   ` Chris Frey
  2010-07-10  8:47   ` Jakub Narebski
  0 siblings, 2 replies; 7+ messages in thread
From: Joshua Jensen @ 2010-07-10  4:08 UTC (permalink / raw)
  To: weigelt; +Cc: git

  ----- Original Message -----
From: Enrico Weigelt
Date: 7/9/2010 9:25 PM
> I'm using git for automatic backups (eg. database dumps). This
> works quite well, but as time goes, the history (and so the repo)
> gets larger and larger. It would be really nice to allow cutting
> off old stuff (eg. after N commits in the past).
>
> Maybe that could be done by introducing "stopper" tags: commits
> that have an stopper-tag may have missing parents, and git-gc
> can be told to ignore those parents and throw away everything
> behind the stopper (if not referenced otherwise).
>
> A probably cleaner, but more invasive way could be making refs
> to vectors, which may contain stop points (multiple ones in case
> of merges) additionally to the start point. Remote transmits only
> contain the commits within this range, and GC also just scans
> the range (instead of following all parents).
Your post reminded me of this: http://progit.org/2010/03/17/replace.html

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10  4:08 ` Joshua Jensen
@ 2010-07-10  6:43   ` Chris Frey
  2010-07-10  8:47   ` Jakub Narebski
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Frey @ 2010-07-10  6:43 UTC (permalink / raw)
  To: Joshua Jensen; +Cc: weigelt, git

On Fri, Jul 09, 2010 at 10:08:46PM -0600, Joshua Jensen wrote:
> Your post reminded me of this: http://progit.org/2010/03/17/replace.html

Wow.  This is what I get for not following git development more closely. :-)

Doesn't this open a potential security problem?  Suppose you want to pull
from another developer's repo, and he's replaced some of the history
in your own tree.  According to the above article, it is possible to
share replacements, presumably like any other ref.

Is it possible to have your own branch history "replaced" by fetching
someone else's repo into a remote branch?

- Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10  4:08 ` Joshua Jensen
  2010-07-10  6:43   ` Chris Frey
@ 2010-07-10  8:47   ` Jakub Narebski
  2010-07-10 10:40     ` Martin Pettersson
  2010-07-10 11:58     ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 7+ messages in thread
From: Jakub Narebski @ 2010-07-10  8:47 UTC (permalink / raw)
  To: Joshua Jensen; +Cc: weigelt, git

Joshua Jensen <jjensen@workspacewhiz.com> writes:

>   ----- Original Message -----
> From: Enrico Weigelt
> Date: 7/9/2010 9:25 PM
>
> > I'm using git for automatic backups (eg. database dumps). This
> > works quite well, but as time goes, the history (and so the repo)
> > gets larger and larger. It would be really nice to allow cutting
> > off old stuff (eg. after N commits in the past).

This is certainly Using Git For What It Was Not Intended...

> >
> > Maybe that could be done by introducing "stopper" tags: commits
> > that have an stopper-tag may have missing parents, and git-gc
> > can be told to ignore those parents and throw away everything
> > behind the stopper (if not referenced otherwise).
> >
> > A probably cleaner, but more invasive way could be making refs
> > to vectors, which may contain stop points (multiple ones in case
> > of merges) additionally to the start point. Remote transmits only
> > contain the commits within this range, and GC also just scans
> > the range (instead of following all parents).
>
> Your post reminded me of this: http://progit.org/2010/03/17/replace.html

Another solution would be to make history shallower like shallow clone
("git clone --depth <depth>") does it[1], and then prune history.  Or
you can use grafts to cauterize history.

Both of those solutions have disadvantages wrt pushing and pulling to
other repositories (shallow clone less so), but I don't think that
would be a problem for your situation.

[1] Documentation/technical/shallow.txt 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10  8:47   ` Jakub Narebski
@ 2010-07-10 10:40     ` Martin Pettersson
  2010-07-10 11:58     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Pettersson @ 2010-07-10 10:40 UTC (permalink / raw)
  To: git

On Saturday, July 10, 2010 03:47:14 pm you wrote:
> Joshua Jensen <jjensen@workspacewhiz.com> writes:
> >   ----- Original Message -----
> > 
> > From: Enrico Weigelt
> > Date: 7/9/2010 9:25 PM
> > 
> > > I'm using git for automatic backups (eg. database dumps). This
> > > works quite well, but as time goes, the history (and so the repo)
> > > gets larger and larger. It would be really nice to allow cutting
> > > off old stuff (eg. after N commits in the past).
> 
> This is certainly Using Git For What It Was Not Intended...
> 
> > > Maybe that could be done by introducing "stopper" tags: commits
> > > that have an stopper-tag may have missing parents, and git-gc
> > > can be told to ignore those parents and throw away everything
> > > behind the stopper (if not referenced otherwise).
> > > 
> > > A probably cleaner, but more invasive way could be making refs
> > > to vectors, which may contain stop points (multiple ones in case
> > > of merges) additionally to the start point. Remote transmits only
> > > contain the commits within this range, and GC also just scans
> > > the range (instead of following all parents).
> > 
> > Your post reminded me of this: http://progit.org/2010/03/17/replace.html
> 
> Another solution would be to make history shallower like shallow clone
> ("git clone --depth <depth>") does it[1], and then prune history.  Or
> you can use grafts to cauterize history.
> 
> Both of those solutions have disadvantages wrt pushing and pulling to
> other repositories (shallow clone less so), but I don't think that
> would be a problem for your situation.
> 
> [1] Documentation/technical/shallow.txt

Don't complicate things, just make new repo when the old one is too large. 
That is what I do and it is for me the best backup system I ever had.
Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10  8:47   ` Jakub Narebski
  2010-07-10 10:40     ` Martin Pettersson
@ 2010-07-10 11:58     ` Ævar Arnfjörð Bjarmason
  2010-07-10 20:12       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-07-10 11:58 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Joshua Jensen, weigelt, git

On Sat, Jul 10, 2010 at 08:47, Jakub Narebski <jnareb@gmail.com> wrote:
> Joshua Jensen <jjensen@workspacewhiz.com> writes:
>
>>   ----- Original Message -----
>> From: Enrico Weigelt
>> Date: 7/9/2010 9:25 PM
>>
>> > I'm using git for automatic backups (eg. database dumps). This
>> > works quite well, but as time goes, the history (and so the repo)
>> > gets larger and larger. It would be really nice to allow cutting
>> > off old stuff (eg. after N commits in the past).
>
> This is certainly Using Git For What It Was Not Intended...

It actually works very well though. I use Git to back up MySQL
databases like this.

Here's the script I use to dump MySQL databases:

    http://github.com/avar/linode-etc/blob/master/bin/cron/mysqldump-to-git

And a small wrapper to dump them all:

    http://github.com/avar/linode-etc/blob/master/bin/cron/mysqldump-to-git-all

I make dumps every 6 hours:

    http://github.com/avar/linode-etc/blob/master/cron.d/v-mysql-git-backup

And after each dump I repack & prune (some of this is probably
redundant given the linear history) the repository:

    http://github.com/avar/linode-etc/blob/master/bin/cron/git-repack-and-gc-dir

And here's graph showing how big the dumps get:

    http://munin.nix.is/nix.is/v.nix.is/dirs_var_backup_mysql.html

The climbing charts before the size cutoff were before I started
repacking them.

As for pruning old history, I thought this *should* work for pruning
history older than 7 days (given that you dump daily):

    git rebase --strategy=base --onto master~8 master~7

But of course that deletes new commits. I need to freshen up on my
rebase understanding. Maybe someone else on list knows how to do
that. I thought git rebase --interactive might work, but I can't get
it to display the root commit. Maybe you need git-filter-branch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cutting history
  2010-07-10 11:58     ` Ævar Arnfjörð Bjarmason
@ 2010-07-10 20:12       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-07-10 20:12 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Joshua Jensen, weigelt, git

On Sat, Jul 10, 2010 at 11:58, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> As for pruning old history, I thought this *should* work for pruning
> history older than 7 days (given that you dump daily):
>
>    git rebase --strategy=base --onto master~8 master~7
>
> But of course that deletes new commits. I need to freshen up on my
> rebase understanding. Maybe someone else on list knows how to do
> that. I thought git rebase --interactive might work, but I can't get
> it to display the root commit. Maybe you need git-filter-branch.

Thiago Macieira on #git provided the answer. You can do that with
grafts and git filter-branch. E.g. rewriting the history so that you
only have the 7 latest commits:

    git rev-list HEAD | sed '7q;d' > .git/info/grafts &&
    test -s .git/info/grafts &&
    git filter-branch -f HEAD

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-10 20:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-10  3:25 Cutting history Enrico Weigelt
2010-07-10  4:08 ` Joshua Jensen
2010-07-10  6:43   ` Chris Frey
2010-07-10  8:47   ` Jakub Narebski
2010-07-10 10:40     ` Martin Pettersson
2010-07-10 11:58     ` Ævar Arnfjörð Bjarmason
2010-07-10 20:12       ` Ævar Arnfjörð Bjarmason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.