git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: worlds slowest git repo- what to do?
       [not found] <5374F7C6.5030205@gmail.com>
@ 2014-05-15 19:06 ` Philip Oakley
  2014-05-15 19:48   ` [git-users] " Sam Vilain
  2014-05-16 10:13   ` Duy Nguyen
  0 siblings, 2 replies; 5+ messages in thread
From: Philip Oakley @ 2014-05-15 19:06 UTC (permalink / raw)
  To: John Fisher, git-users-/JYPxA39Uh5TLH3MbocFFw; +Cc: Git List

From: "John Fisher" <fishook2033-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>I assert based on one piece of evidence ( a post from a facebook dev) 
>that I now have the worlds biggest and slowest git
> repository, and I am not a happy guy. I used to have the worlds 
> biggest CVS repository, but CVS can't handle multi-G
> sized files. So I moved the repo to git, because we are using that for 
> our new projects.
>
> goal:
> keep 150 G of files (mostly binary) from tiny sized to over 8G in a 
> version-control system.
>
> problem:
> git is absurdly slow, think hours, on fast hardware.
>
> question:
> any suggestions beyond these-
> http://git-annex.branchable.com/
> https://github.com/jedbrown/git-fat
> https://github.com/schacon/git-media
> http://code.google.com/p/boar/
> subversion
>
> ?

At the moment some of the developers are looking to speed up some of the 
code on very large repos, though I think they are looking at code repos, 
rather than large file repos. They were looking for large repos to test 
some of the code upon ;-)

I've copied the Git list should they want to make any suggestions.

>
>
> Thanks.
>
> -- 
Philip 

-- 
You received this message because you are subscribed to the Google Groups "Git for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [git-users] worlds slowest git repo- what to do?
  2014-05-15 19:06 ` worlds slowest git repo- what to do? Philip Oakley
@ 2014-05-15 19:48   ` Sam Vilain
  2014-05-16 10:13   ` Duy Nguyen
  1 sibling, 0 replies; 5+ messages in thread
From: Sam Vilain @ 2014-05-15 19:48 UTC (permalink / raw)
  To: Philip Oakley, John Fisher, git-users; +Cc: Git List

On 05/15/2014 12:06 PM, Philip Oakley wrote:
> From: "John Fisher" <fishook2033@gmail.com>
>> I assert based on one piece of evidence ( a post from a facebook dev)
>> that I now have the worlds biggest and slowest git
>> repository, and I am not a happy guy. I used to have the worlds
>> biggest CVS repository, but CVS can't handle multi-G
>> sized files. So I moved the repo to git, because we are using that
>> for our new projects.
>>
>> goal:
>> keep 150 G of files (mostly binary) from tiny sized to over 8G in a
>> version-control system.
>>
>> problem:
>> git is absurdly slow, think hours, on fast hardware.
>>
>> question:
>> any suggestions beyond these-
>> http://git-annex.branchable.com/
>> https://github.com/jedbrown/git-fat
>> https://github.com/schacon/git-media
>> http://code.google.com/p/boar/
>> subversion
>>

You could shard.  Break the problem up into smaller repositories, eg via
submodules.  Try ~128 shards and I'd expect that 129 small clones should
complete faster than a single 150G clone, as well as being resumable etc.

The first challenge will be figuring out what to shard on, and how to
lay out the repository.  You could have all of the large files in their
own directory, and then the main repository just has symlinks into the
sharded area.  In that case, I would recommend sharding by date of the
introduced blob, so that there's a good chance you won't need to clone
everything forever; as shards with not many files for the current
version could in theory be retired.  Or, if the directory structure
already suits it, you could "directly" use submodules.

The second challenge will be writing the filter-branch script for this :-)

Good luck,
Sam

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [git-users] worlds slowest git repo- what to do?
  2014-05-15 19:06 ` worlds slowest git repo- what to do? Philip Oakley
  2014-05-15 19:48   ` [git-users] " Sam Vilain
@ 2014-05-16 10:13   ` Duy Nguyen
       [not found]     ` <CACsJy8CmiW88tNavRphZa_uMU=jVUCQE6cw5+t2AYnf5dDmcsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2014-05-16 10:13 UTC (permalink / raw)
  To: John Fisher; +Cc: git-users, Git List, Philip Oakley

On Fri, May 16, 2014 at 2:06 AM, Philip Oakley <philipoakley@iee.org> wrote:
> From: "John Fisher" <fishook2033@gmail.com>
>>
>> I assert based on one piece of evidence ( a post from a facebook dev) that
>> I now have the worlds biggest and slowest git
>> repository, and I am not a happy guy. I used to have the worlds biggest
>> CVS repository, but CVS can't handle multi-G
>> sized files. So I moved the repo to git, because we are using that for our
>> new projects.
>>
>> goal:
>> keep 150 G of files (mostly binary) from tiny sized to over 8G in a
>> version-control system.

I think your best bet so far is git-annex (or maybe bup) for dealing
with huge files. I plan on resurrecting Junio's split-blob series to
make core git handle huge files better, but there's no eta on that.
The problem here is about file size, not the number of files, or
history depth, right?

>> problem:
>> git is absurdly slow, think hours, on fast hardware.

Probably known issues. But some elaboration would be nice (e.g. what
operation is slow, how slow, some more detail characteristics of the
repo..) in case new problems pop up.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: worlds slowest git repo- what to do?
       [not found]     ` <CACsJy8CmiW88tNavRphZa_uMU=jVUCQE6cw5+t2AYnf5dDmcsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-05-16 21:22       ` John Fisher
  2014-05-17  1:49         ` [git-users] " Duy Nguyen
  0 siblings, 1 reply; 5+ messages in thread
From: John Fisher @ 2014-05-16 21:22 UTC (permalink / raw)
  To: Duy Nguyen, John Fisher
  Cc: git-users-/JYPxA39Uh5TLH3MbocFFw, Git List, Philip Oakley


On 05/16/2014 03:13 AM, Duy Nguyen wrote:
> On Fri, May 16, 2014 at 2:06 AM, Philip Oakley <philipoakley-7KbaBNvhQFM@public.gmane.org> wrote:
>> From: "John Fisher" <fishook2033-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> I assert based on one piece of evidence ( a post from a facebook dev) that
>>> I now have the worlds biggest and slowest git
>>> repository, and I am not a happy guy. I used to have the worlds biggest
>>> CVS repository, but CVS can't handle multi-G
>>> sized files. So I moved the repo to git, because we are using that for our
>>> new projects.
>>>
>>> goal:
>>> keep 150 G of files (mostly binary) from tiny sized to over 8G in a
>>> version-control system.
> I think your best bet so far is git-annex 

good, I am  looking at that

> (or maybe bup) for dealing
> with huge files. I plan on resurrecting Junio's split-blob series to
> make core git handle huge files better, but there's no eta on that.
> The problem here is about file size, not the number of files, or
> history depth, right?

When things here calm down, I could easily test the repo without the giant files, leaving 99% of files in the repo.
There is hardly any history depth because these are releases, version controlled by directory name. As has been
suggested I could be forced to abandon the version-control, even to the point of just using rsync.  But I've been doing
this with CVS for 10 years now and I hate to change or in any way move away fron KISS. Moving it to Git may not have
been one of my better ideas...


> Probably known issues. But some elaboration would be nice (e.g. what operation is slow, how slow, some more detail
> characteristics of the repo..) in case new problems pop up. 

so far I have done add, commit, status, clone - commit and status are slow; add seems to depend on the files involved,
clone seems to run at network speed.
I can provide metrics later, see above. email me offline with what you want.

John

-- 
You received this message because you are subscribed to the Google Groups "Git for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [git-users] worlds slowest git repo- what to do?
  2014-05-16 21:22       ` John Fisher
@ 2014-05-17  1:49         ` Duy Nguyen
  0 siblings, 0 replies; 5+ messages in thread
From: Duy Nguyen @ 2014-05-17  1:49 UTC (permalink / raw)
  To: John Fisher; +Cc: git-users, Git List, Philip Oakley

On Sat, May 17, 2014 at 4:22 AM, John Fisher <fishook2033@gmail.com> wrote:
>> Probably known issues. But some elaboration would be nice (e.g. what operation is slow, how slow, some more detail
>> characteristics of the repo..) in case new problems pop up.
>
> so far I have done add, commit, status, clone - commit and status are slow; add seems to depend on the files involved,
> clone seems to run at network speed.
> I can provide metrics later, see above. email me offline with what you want.

OK "commit -a" should be just as slow as "add", but as-is commit and
status should be fast unless there are lots of files (how many in your
worktree?) or we hit something that makes us look into (large) file
content anyway.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-17  1:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5374F7C6.5030205@gmail.com>
2014-05-15 19:06 ` worlds slowest git repo- what to do? Philip Oakley
2014-05-15 19:48   ` [git-users] " Sam Vilain
2014-05-16 10:13   ` Duy Nguyen
     [not found]     ` <CACsJy8CmiW88tNavRphZa_uMU=jVUCQE6cw5+t2AYnf5dDmcsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-16 21:22       ` John Fisher
2014-05-17  1:49         ` [git-users] " Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).