All of lore.kernel.org
 help / color / mirror / Atom feed
* git for game development?
@ 2011-08-23 23:06 Lawrence Brett
  2011-08-23 23:32 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Lawrence Brett @ 2011-08-23 23:06 UTC (permalink / raw)
  To: git

Hello,

I am very interested in using git for game development.  I will be working
with a lot of binaries (textures, 3d assets, etc.) in addition to source
files.  I'd like to be able to version these files, but I understand that
big binaries aren't git's forte.  I've found several possible workarounds
(git submodules, git-media, git-annex), but the one that seems most
promising is bup.  I started a thread on the bup mailing list to ask about
the best way to use bup with git for my purposes.  One of the respondents
suggested forking git itself to include bup functionality, thereby extending
git to handle binaries efficiently.

My question for this group is:  would there be interest in incorporating
this sort of functionality into git core?  I would certainly find it
compelling as a user, but have no idea how it would fit into the bigger
picture.

Thanks in advance!

Cliff

P.S.  I also heartily welcome any advice/insight on my use case.  :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git for game development?
  2011-08-23 23:06 git for game development? Lawrence Brett
@ 2011-08-23 23:32 ` Junio C Hamano
  2011-08-24  1:24 ` Jeff King
  2011-08-25  6:53 ` One MMORPG git facts Marat Radchenko
  2 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-08-23 23:32 UTC (permalink / raw)
  To: Lawrence Brett; +Cc: git

Lawrence Brett <lcbrett@gmail.com> writes:

> My question for this group is:  would there be interest in incorporating
> this sort of functionality into git core?  I would certainly find it
> compelling as a user, but have no idea how it would fit into the bigger
> picture.

I personally think it is too early for you to ask that question; until you
set up a workable workflow around bup or a combination of bup and git, get
used to its use, and find out what the real pain points are if you used
only git without bup, that is.

Efforts to tweak tools by people who are not yet familiar with the tools
they are trying to use unfortunately often tend to go in wrong directions
and become wasted effort.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git for game development?
  2011-08-23 23:06 git for game development? Lawrence Brett
  2011-08-23 23:32 ` Junio C Hamano
@ 2011-08-24  1:24 ` Jeff King
  2011-08-24 17:17   ` Junio C Hamano
  2011-08-25  6:53 ` One MMORPG git facts Marat Radchenko
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2011-08-24  1:24 UTC (permalink / raw)
  To: Lawrence Brett; +Cc: git

On Tue, Aug 23, 2011 at 04:06:47PM -0700, Lawrence Brett wrote:

> I am very interested in using git for game development.  I will be working
> with a lot of binaries (textures, 3d assets, etc.) in addition to source
> files.  I'd like to be able to version these files, but I understand that
> big binaries aren't git's forte.  I've found several possible workarounds
> (git submodules, git-media, git-annex), but the one that seems most
> promising is bup.  I started a thread on the bup mailing list to ask about
> the best way to use bup with git for my purposes.  One of the respondents
> suggested forking git itself to include bup functionality, thereby extending
> git to handle binaries efficiently.
> 
> My question for this group is:  would there be interest in incorporating
> this sort of functionality into git core?  I would certainly find it
> compelling as a user, but have no idea how it would fit into the bigger
> picture.

Something bup-like in git-core might eventually be good. But IIRC, bup
introduces new object types, which mixes the abstract view of the data
format (i.e., commits, trees, and blobs indexed by sha1) with the
implementation details (e.g., now we have both loose objects in their
own files as well as delta-compressed objects in packfiles).

That means that bup-git clients and non-bup git clients don't interact
very well. Where non-bup is either a client that doesn't understand the
bup objects, or one that chooses not to use bup-like encoding for
particular blobs.

I don't remember all of the details of bup, but if it's possible to
implement something similar at a lower level (i.e., at the layer of
packfiles or object storage), then it can be a purely local thing, and
the compatibility issues can go away.

-Peff

PS I also agree with Junio's comment that we are not at the "planning a
solution" stage with big files, but rather at the "trying it and getting
experience on what works and what doesn't" stage.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git for game development?
  2011-08-24  1:24 ` Jeff King
@ 2011-08-24 17:17   ` Junio C Hamano
  2011-08-24 18:26     ` Jeff King
  2011-08-27 15:32     ` Michael Witten
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-08-24 17:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Lawrence Brett, git

Jeff King <peff@peff.net> writes:

> I don't remember all of the details of bup, but if it's possible to
> implement something similar at a lower level (i.e., at the layer of
> packfiles or object storage), then it can be a purely local thing, and
> the compatibility issues can go away.

I tend to agree, and we might be closer than we realize.

I suspect that people with large binary assets were scared away by rumors
they heard second-hand, based on bad experiences other people had before
any of the recent efforts made in various "large Git" topics, and they
themselves haven't tried recent versions of Git enough to be able to tell
what the remaining pain points are. I wouldn't be surprised if none of the
core Git people tried shoving huge binary assets in test repositories with
recent versions of Git---I certainly haven't.

We used to always map the blob data as a whole for anything we do, but
these days, with changes like your abb371a (diff: don't retrieve binary
blobs for diffstat, 2011-02-19) and my recent "send large blob straight to
a new pack" and "stream large data out to the working tree without holding
everything in core while checking out" topics, I suspect that the support
for local usage of large blobs might be sufficiently better than the old
days. Git might even be usable locally without anything else, which I find
implausible, but I wouldn't be surprised if there remained only a handful
minor things remaining that we need to add to make it usable.

People toyed around with ideas to have a separate object store
representation for large and possibly incompressible blobs (a possible
complaint being that it is pointless to send them even to its own
packfile). One possible implementation would be to add a new huge
hierarchy under $GIT_DIR/objects/, compute the object name exactly the
same way for huge blobs as we normally would (i.e. hash concatenation of
object header and then contents) to decide which subdirectory under the
"huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we
do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there
won't be very many). The data can be stored unmodified as a file in that
directory, with type stored in a separate file---that way, we won't have
to compress, but we just copy. You still need to hash it at least once to
come up with the object name, but that is what gives us integrity checks,
is unavoidable and is not going to change.

The sha1_object_info() layer can learn to return the type and size from
such a representation, and you can further tweak the same places as the
"streaming checkout" and the "checkin to a pack" topics touched to support
such a representation.

I would suspect that the local object representation is _not_ the largest
pain point; such a separate object store representation is not buying us
very much over a simpler "single large blob in a separate packfile", and
if the counter-argument is "no, decompressing still costs a lot", then the
real issue might be we decompress and look at the data when we do not have
to (i.e. issues similar to what abb371a addressed), not "decompress vs
straight copy make a bit difference".

I would further suspect that we _might_ need a better support for local
repacking and object transfer, with or without such a third object
representation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git for game development?
  2011-08-24 17:17   ` Junio C Hamano
@ 2011-08-24 18:26     ` Jeff King
  2011-08-27 15:32     ` Michael Witten
  1 sibling, 0 replies; 9+ messages in thread
From: Jeff King @ 2011-08-24 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lawrence Brett, git

On Wed, Aug 24, 2011 at 10:17:49AM -0700, Junio C Hamano wrote:

> I suspect that people with large binary assets were scared away by rumors
> they heard second-hand, based on bad experiences other people had before
> any of the recent efforts made in various "large Git" topics, and they
> themselves haven't tried recent versions of Git enough to be able to tell
> what the remaining pain points are. I wouldn't be surprised if none of the
> core Git people tried shoving huge binary assets in test repositories with
> recent versions of Git---I certainly haven't.

I haven't tried anything really big in a while. My personal interest in
big file support has been:

  1. Mid-sized photos and videos (objects top out around 50M, total repo
     size is 4G packed). Most commits are additions or tweaks of exif
     tags (so they delta well). Using gitattributes (and especially
     textconv caching), it's really quite pleasant to use. Doing a full
     repack is my only complaint; the delta-compression isn't bad, but
     just the I/O on rewriting the whole thing is a killer.

  2. Storing an entire audio collection in flac. Median file size is
     only around 20M, but the whole repo is 120G.  Obviously compression
     doesn't buy much, so a git repo plus checkout is 240G, which is
     pretty hefty for most laptops. I played with this early on, but
     gave up; the data storage model just doesn't make sense.

The two common use cases that aren't represented here are:

  3. Big files, not just big repos. I.e., files that are 1G or more.

  4. Medium-big files that don't delta well (e.g., metadata tweaks do
     delta well; rewriting media assets for a game don't delta well).

I think recent changes (like putting big files straight to packs) make
(3) and (4) reasonably pleasant.

I'm not sure of the right answer for (1). The repack is the only
annoying thing. But not repacking is not satisfying, either.  You don't
get deltas where they are applicable, and the server is always
re-examining the pack for possible deltas on fetch and push. Some sort
of hybrid loose-pack storage would be nice: store delta chains for big
files in their own individual packs, but otherwise keep everything in a
separate pack. We would want some kind of meta-index over all of these
little pack-files, not just individual pack-file indices.

But (2) is the hardest one. It would be nice if we had some kind of
local-remote hybrid storage, where objects were fetched on demand from
somewhere else. For example, developers on workstations with a fast
local network to a storage server wouldn't have to replicate all of the
objects locally. And for a true distributed setup, when the fast network
isn't there, it would be nice to fail gracefully (which maybe just means
saying "sorry, we can't do 'log -p' right now; try 'log --raw'").

I wonder how close one can get on (2) using alternates and a
network-mounted filesystem.

> People toyed around with ideas to have a separate object store
> representation for large and possibly incompressible blobs (a possible
> complaint being that it is pointless to send them even to its own
> packfile). One possible implementation would be to add a new huge
> hierarchy under $GIT_DIR/objects/, compute the object name exactly the
> same way for huge blobs as we normally would (i.e. hash concatenation of
> object header and then contents) to decide which subdirectory under the
> "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we
> do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there
> won't be very many). The data can be stored unmodified as a file in that
> directory, with type stored in a separate file---that way, we won't have
> to compress, but we just copy. You still need to hash it at least once to
> come up with the object name, but that is what gives us integrity checks,
> is unavoidable and is not going to change.

Yeah. I think one of the bonuses there is that some filesystems are
capable of referencing the same inodes in a copy-on-write way, so "add"
and "checkout" cease to be a copy operation, but rather an inode-linking
operation. Which is a big win, both for speed and storage.

I've had dreams of using hard-linking to do something similar, but it's
just not safe enough without some filesystem-level copy-on-write
protection.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* One MMORPG git facts
  2011-08-23 23:06 git for game development? Lawrence Brett
  2011-08-23 23:32 ` Junio C Hamano
  2011-08-24  1:24 ` Jeff King
@ 2011-08-25  6:53 ` Marat Radchenko
  2011-08-25  7:57   ` J.H.
  2 siblings, 1 reply; 9+ messages in thread
From: Marat Radchenko @ 2011-08-25  6:53 UTC (permalink / raw)
  To: git

Lawrence Brett <lcbrett <at> gmail.com> writes:

> 
> Hello,
> 
> I am very interested in using git for game development.  I will be working
> with a lot of binaries (textures, 3d assets, etc.) in addition to source
> files.  I'd like to be able to version these files, but I understand that
> big binaries aren't git's forte.

Define "big".

I have one MMORPG here under Git. 250k revisions, 500k files in working dir
(7Gb), 200 commits daily, 250Gb Git repo, SVN upstream repo of ~1Tb.

Some facts:
1. It is unusable on 32bit machine (here and there hits memory limit for a
single process
2. It is unusable on Windows (because there's no 64bit msysgit)
3. git status is 3s with hot disk caches (7mins with cold)
4. History traversal means really massive I/O.
5. Current setup: 120Gb 10k rpm disk for everything but .git/objects/pack,
separate 500Gb (will be upgraded to 1Tb soon) disk for packs
6. git gc is PAIN. I do it on weekends because it takes more than a day to run.
Also, limits for git pack-objects should be configured VERY carefully, it can
either run out of ram or take weeks to run if configured improperly.
7. With default gc settings, git wants to gc daily (but gc takes more than a
day, so if you follow its desire, you're in gc loop). I set objects limit to a
very high value and invoke gc manually.
8. svn users cannot sensibly do status on whole working copy (more than 10 mins)
9. svn users only update witha nightly script (40 mins)
10. git commit is several seconds because it writes 70Mb commit file.
11. It is a good idea to run git status often so that working copy info isn't
evicted from OS disk caches (remember, 3s vs 7min)
12. Cloning git repo is one more pain. 100mbps network here, so fetching 250Gb
takes some time. But worse, if cloning via git:// protocol, after fetching git
sits for several hours in "Resolving deltas" stage. So, for initial cloning
rsync is used.
13. Here and there i hit scalability issues in various git commands (which i
report to maillist and most [well, all, except the one i reported this week] of
which get fixed)

Hope this helps to get the idea of how git behaves on a large scale. Overall,
i'm happy with it and won't return to svn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: One MMORPG git facts
  2011-08-25  6:53 ` One MMORPG git facts Marat Radchenko
@ 2011-08-25  7:57   ` J.H.
  2011-08-25 16:02     ` Marat Radchenko
  0 siblings, 1 reply; 9+ messages in thread
From: J.H. @ 2011-08-25  7:57 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

On 08/24/2011 11:53 PM, Marat Radchenko wrote:
> Lawrence Brett <lcbrett <at> gmail.com> writes:
> 
>>
>> Hello,
>>
>> I am very interested in using git for game development.  I will be working
>> with a lot of binaries (textures, 3d assets, etc.) in addition to source
>> files.  I'd like to be able to version these files, but I understand that
>> big binaries aren't git's forte.
> 
> Define "big".
> 
> I have one MMORPG here under Git. 250k revisions, 500k files in working dir
> (7Gb), 200 commits daily, 250Gb Git repo, SVN upstream repo of ~1Tb.

Given the differences, I'm morbidly curious, which actually ends up
being the more usable version control system of a project of this scale?
 It sounds like (from what you've said) git is generally faster,
assuming it can get enough resources (which can obviously be hard at the
scales your talking).

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: One MMORPG git facts
  2011-08-25  7:57   ` J.H.
@ 2011-08-25 16:02     ` Marat Radchenko
  0 siblings, 0 replies; 9+ messages in thread
From: Marat Radchenko @ 2011-08-25 16:02 UTC (permalink / raw)
  To: J.H.; +Cc: git

On 08/25/2011 11:57:07 MSD, J.H. <warthog9@eaglescrag.net> wrote:
> Given the differences, I'm morbidly curious, which actually ends up
> being the more usable version control system of a project of this scale?
>   It sounds like (from what you've said) git is generally faster,
> assuming it can get enough resources (which can obviously be hard at the
> scales your talking).

Hard to compare (especially because I don't have pure git environment but git-svn clone). I have give you some 

First, there are lots of non-geek people working on MMORPG (quest designers, modellers, text writers, map designers). Many of them find it hard to understand DVCS concepts and prefer living with linear history in a single branch (svn trunk). Their work is highly isolated from each other (for ex, maps are split in "regions" and only one person is allowed to edit one region simultanuously, only one modeller works on a particular model, each quest has a person responsible for it) so they don't hit conflicts as often as programmers do. And since svn up of whole tree takes 40 mins, they don't update during work day but have nightly script for that so the only thing they regularly use is svn commit.

Second, there's TortoiseSVN that allows easy (for non-geeks) GUI history inspection.

Third, we have 200 commits per day (8 work hours), that's one commit each 2.4 mins (actually, much less during lunch and much more in the morning caused by the fact that programmers are not allowed to commit after 16:00), so you copy is outdated all the time. If upstream repo was git, one would have to pull + push in those 2.4 mins, otherwise she would hit non-ff push. This could be fixed by using separate repos, though that would complicate git setup even more.

On the other hand, git is really great for programmers. Heck, svn still doesn't have anything like "git log -u" (well, afaik, they finally added it in 1.7)! Stash/bisect/local commits/history rewriting/cheap branching(almost no branching happens in svn repo because that either involves either fetching of 7Gb or [if svn switch is used] 10-30mins [don't remember exactly] of massive I/O. [sorry for two-level nesting]) are very handy on daily basis. Also, git allows easy sharing of experimental changes between programmers without touching shared server. It also allows atomic commits for the whole working copy (which is important when programmer changes server, client or data at the same time).

There are some decisions that were made and without which repo could be smaller (for example, client/server binaries are commited daily so that designers can use them on next day), however these decisions were made long before i joined the project and are very likely to stay this way.

To sum this up: git is a wonderful (and very powerful) tool for programmers but too complex for non-tech users.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git for game development?
  2011-08-24 17:17   ` Junio C Hamano
  2011-08-24 18:26     ` Jeff King
@ 2011-08-27 15:32     ` Michael Witten
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Witten @ 2011-08-27 15:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Avery Pennarun, Jeff King, Lawrence Brett, git

On Wed, Aug 24, 2011 at 17:17, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> I don't remember all of the details of bup, but if it's possible to
>> implement something similar at a lower level (i.e., at the layer of
>> packfiles or object storage), then it can be a purely local thing, and
>> the compatibility issues can go away.
>
> I tend to agree, and we might be closer than we realize.
>
> I suspect that people with large binary assets were scared away by rumors
> they heard second-hand, based on bad experiences other people had before
> any of the recent efforts made in various "large Git" topics, and they
> themselves haven't tried recent versions of Git enough to be able to tell
> what the remaining pain points are. I wouldn't be surprised if none of the
> core Git people tried shoving huge binary assets in test repositories with
> recent versions of Git---I certainly haven't.
>
> We used to always map the blob data as a whole for anything we do, but
> these days, with changes like your abb371a (diff: don't retrieve binary
> blobs for diffstat, 2011-02-19) and my recent "send large blob straight to
> a new pack" and "stream large data out to the working tree without holding
> everything in core while checking out" topics, I suspect that the support
> for local usage of large blobs might be sufficiently better than the old
> days. Git might even be usable locally without anything else, which I find
> implausible, but I wouldn't be surprised if there remained only a handful
> minor things remaining that we need to add to make it usable.
>
> People toyed around with ideas to have a separate object store
> representation for large and possibly incompressible blobs (a possible
> complaint being that it is pointless to send them even to its own
> packfile). One possible implementation would be to add a new huge
> hierarchy under $GIT_DIR/objects/, compute the object name exactly the
> same way for huge blobs as we normally would (i.e. hash concatenation of
> object header and then contents) to decide which subdirectory under the
> "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we
> do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there
> won't be very many). The data can be stored unmodified as a file in that
> directory, with type stored in a separate file---that way, we won't have
> to compress, but we just copy. You still need to hash it at least once to
> come up with the object name, but that is what gives us integrity checks,
> is unavoidable and is not going to change.
>
> The sha1_object_info() layer can learn to return the type and size from
> such a representation, and you can further tweak the same places as the
> "streaming checkout" and the "checkin to a pack" topics touched to support
> such a representation.
>
> I would suspect that the local object representation is _not_ the largest
> pain point; such a separate object store representation is not buying us
> very much over a simpler "single large blob in a separate packfile", and
> if the counter-argument is "no, decompressing still costs a lot", then the
> real issue might be we decompress and look at the data when we do not have
> to (i.e. issues similar to what abb371a addressed), not "decompress vs
> straight copy make a bit difference".

I've added Avery to the Cc list, because he really needs to chime in here.

I am completely unqualified to make a comment about this, but I think
that it would be silly to ignore the insights that Avery has about
storing large objects; `bup' uses rolling checksums and a `bloom
filter' implementation and who knows what else.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-27 15:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-23 23:06 git for game development? Lawrence Brett
2011-08-23 23:32 ` Junio C Hamano
2011-08-24  1:24 ` Jeff King
2011-08-24 17:17   ` Junio C Hamano
2011-08-24 18:26     ` Jeff King
2011-08-27 15:32     ` Michael Witten
2011-08-25  6:53 ` One MMORPG git facts Marat Radchenko
2011-08-25  7:57   ` J.H.
2011-08-25 16:02     ` Marat Radchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.