* git for game development? @ 2011-08-23 23:06 Lawrence Brett 2011-08-23 23:32 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Lawrence Brett @ 2011-08-23 23:06 UTC (permalink / raw) To: git Hello, I am very interested in using git for game development. I will be working with a lot of binaries (textures, 3d assets, etc.) in addition to source files. I'd like to be able to version these files, but I understand that big binaries aren't git's forte. I've found several possible workarounds (git submodules, git-media, git-annex), but the one that seems most promising is bup. I started a thread on the bup mailing list to ask about the best way to use bup with git for my purposes. One of the respondents suggested forking git itself to include bup functionality, thereby extending git to handle binaries efficiently. My question for this group is: would there be interest in incorporating this sort of functionality into git core? I would certainly find it compelling as a user, but have no idea how it would fit into the bigger picture. Thanks in advance! Cliff P.S. I also heartily welcome any advice/insight on my use case. :-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git for game development? 2011-08-23 23:06 git for game development? Lawrence Brett @ 2011-08-23 23:32 ` Junio C Hamano 2011-08-24 1:24 ` Jeff King 2011-08-25 6:53 ` One MMORPG git facts Marat Radchenko 2 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2011-08-23 23:32 UTC (permalink / raw) To: Lawrence Brett; +Cc: git Lawrence Brett <lcbrett@gmail.com> writes: > My question for this group is: would there be interest in incorporating > this sort of functionality into git core? I would certainly find it > compelling as a user, but have no idea how it would fit into the bigger > picture. I personally think it is too early for you to ask that question; until you set up a workable workflow around bup or a combination of bup and git, get used to its use, and find out what the real pain points are if you used only git without bup, that is. Efforts to tweak tools by people who are not yet familiar with the tools they are trying to use unfortunately often tend to go in wrong directions and become wasted effort. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git for game development? 2011-08-23 23:06 git for game development? Lawrence Brett 2011-08-23 23:32 ` Junio C Hamano @ 2011-08-24 1:24 ` Jeff King 2011-08-24 17:17 ` Junio C Hamano 2011-08-25 6:53 ` One MMORPG git facts Marat Radchenko 2 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2011-08-24 1:24 UTC (permalink / raw) To: Lawrence Brett; +Cc: git On Tue, Aug 23, 2011 at 04:06:47PM -0700, Lawrence Brett wrote: > I am very interested in using git for game development. I will be working > with a lot of binaries (textures, 3d assets, etc.) in addition to source > files. I'd like to be able to version these files, but I understand that > big binaries aren't git's forte. I've found several possible workarounds > (git submodules, git-media, git-annex), but the one that seems most > promising is bup. I started a thread on the bup mailing list to ask about > the best way to use bup with git for my purposes. One of the respondents > suggested forking git itself to include bup functionality, thereby extending > git to handle binaries efficiently. > > My question for this group is: would there be interest in incorporating > this sort of functionality into git core? I would certainly find it > compelling as a user, but have no idea how it would fit into the bigger > picture. Something bup-like in git-core might eventually be good. But IIRC, bup introduces new object types, which mixes the abstract view of the data format (i.e., commits, trees, and blobs indexed by sha1) with the implementation details (e.g., now we have both loose objects in their own files as well as delta-compressed objects in packfiles). That means that bup-git clients and non-bup git clients don't interact very well. Where non-bup is either a client that doesn't understand the bup objects, or one that chooses not to use bup-like encoding for particular blobs. I don't remember all of the details of bup, but if it's possible to implement something similar at a lower level (i.e., at the layer of packfiles or object storage), then it can be a purely local thing, and the compatibility issues can go away. -Peff PS I also agree with Junio's comment that we are not at the "planning a solution" stage with big files, but rather at the "trying it and getting experience on what works and what doesn't" stage. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git for game development? 2011-08-24 1:24 ` Jeff King @ 2011-08-24 17:17 ` Junio C Hamano 2011-08-24 18:26 ` Jeff King 2011-08-27 15:32 ` Michael Witten 0 siblings, 2 replies; 9+ messages in thread From: Junio C Hamano @ 2011-08-24 17:17 UTC (permalink / raw) To: Jeff King; +Cc: Lawrence Brett, git Jeff King <peff@peff.net> writes: > I don't remember all of the details of bup, but if it's possible to > implement something similar at a lower level (i.e., at the layer of > packfiles or object storage), then it can be a purely local thing, and > the compatibility issues can go away. I tend to agree, and we might be closer than we realize. I suspect that people with large binary assets were scared away by rumors they heard second-hand, based on bad experiences other people had before any of the recent efforts made in various "large Git" topics, and they themselves haven't tried recent versions of Git enough to be able to tell what the remaining pain points are. I wouldn't be surprised if none of the core Git people tried shoving huge binary assets in test repositories with recent versions of Git---I certainly haven't. We used to always map the blob data as a whole for anything we do, but these days, with changes like your abb371a (diff: don't retrieve binary blobs for diffstat, 2011-02-19) and my recent "send large blob straight to a new pack" and "stream large data out to the working tree without holding everything in core while checking out" topics, I suspect that the support for local usage of large blobs might be sufficiently better than the old days. Git might even be usable locally without anything else, which I find implausible, but I wouldn't be surprised if there remained only a handful minor things remaining that we need to add to make it usable. People toyed around with ideas to have a separate object store representation for large and possibly incompressible blobs (a possible complaint being that it is pointless to send them even to its own packfile). One possible implementation would be to add a new huge hierarchy under $GIT_DIR/objects/, compute the object name exactly the same way for huge blobs as we normally would (i.e. hash concatenation of object header and then contents) to decide which subdirectory under the "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there won't be very many). The data can be stored unmodified as a file in that directory, with type stored in a separate file---that way, we won't have to compress, but we just copy. You still need to hash it at least once to come up with the object name, but that is what gives us integrity checks, is unavoidable and is not going to change. The sha1_object_info() layer can learn to return the type and size from such a representation, and you can further tweak the same places as the "streaming checkout" and the "checkin to a pack" topics touched to support such a representation. I would suspect that the local object representation is _not_ the largest pain point; such a separate object store representation is not buying us very much over a simpler "single large blob in a separate packfile", and if the counter-argument is "no, decompressing still costs a lot", then the real issue might be we decompress and look at the data when we do not have to (i.e. issues similar to what abb371a addressed), not "decompress vs straight copy make a bit difference". I would further suspect that we _might_ need a better support for local repacking and object transfer, with or without such a third object representation. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git for game development? 2011-08-24 17:17 ` Junio C Hamano @ 2011-08-24 18:26 ` Jeff King 2011-08-27 15:32 ` Michael Witten 1 sibling, 0 replies; 9+ messages in thread From: Jeff King @ 2011-08-24 18:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lawrence Brett, git On Wed, Aug 24, 2011 at 10:17:49AM -0700, Junio C Hamano wrote: > I suspect that people with large binary assets were scared away by rumors > they heard second-hand, based on bad experiences other people had before > any of the recent efforts made in various "large Git" topics, and they > themselves haven't tried recent versions of Git enough to be able to tell > what the remaining pain points are. I wouldn't be surprised if none of the > core Git people tried shoving huge binary assets in test repositories with > recent versions of Git---I certainly haven't. I haven't tried anything really big in a while. My personal interest in big file support has been: 1. Mid-sized photos and videos (objects top out around 50M, total repo size is 4G packed). Most commits are additions or tweaks of exif tags (so they delta well). Using gitattributes (and especially textconv caching), it's really quite pleasant to use. Doing a full repack is my only complaint; the delta-compression isn't bad, but just the I/O on rewriting the whole thing is a killer. 2. Storing an entire audio collection in flac. Median file size is only around 20M, but the whole repo is 120G. Obviously compression doesn't buy much, so a git repo plus checkout is 240G, which is pretty hefty for most laptops. I played with this early on, but gave up; the data storage model just doesn't make sense. The two common use cases that aren't represented here are: 3. Big files, not just big repos. I.e., files that are 1G or more. 4. Medium-big files that don't delta well (e.g., metadata tweaks do delta well; rewriting media assets for a game don't delta well). I think recent changes (like putting big files straight to packs) make (3) and (4) reasonably pleasant. I'm not sure of the right answer for (1). The repack is the only annoying thing. But not repacking is not satisfying, either. You don't get deltas where they are applicable, and the server is always re-examining the pack for possible deltas on fetch and push. Some sort of hybrid loose-pack storage would be nice: store delta chains for big files in their own individual packs, but otherwise keep everything in a separate pack. We would want some kind of meta-index over all of these little pack-files, not just individual pack-file indices. But (2) is the hardest one. It would be nice if we had some kind of local-remote hybrid storage, where objects were fetched on demand from somewhere else. For example, developers on workstations with a fast local network to a storage server wouldn't have to replicate all of the objects locally. And for a true distributed setup, when the fast network isn't there, it would be nice to fail gracefully (which maybe just means saying "sorry, we can't do 'log -p' right now; try 'log --raw'"). I wonder how close one can get on (2) using alternates and a network-mounted filesystem. > People toyed around with ideas to have a separate object store > representation for large and possibly incompressible blobs (a possible > complaint being that it is pointless to send them even to its own > packfile). One possible implementation would be to add a new huge > hierarchy under $GIT_DIR/objects/, compute the object name exactly the > same way for huge blobs as we normally would (i.e. hash concatenation of > object header and then contents) to decide which subdirectory under the > "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we > do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there > won't be very many). The data can be stored unmodified as a file in that > directory, with type stored in a separate file---that way, we won't have > to compress, but we just copy. You still need to hash it at least once to > come up with the object name, but that is what gives us integrity checks, > is unavoidable and is not going to change. Yeah. I think one of the bonuses there is that some filesystems are capable of referencing the same inodes in a copy-on-write way, so "add" and "checkout" cease to be a copy operation, but rather an inode-linking operation. Which is a big win, both for speed and storage. I've had dreams of using hard-linking to do something similar, but it's just not safe enough without some filesystem-level copy-on-write protection. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git for game development? 2011-08-24 17:17 ` Junio C Hamano 2011-08-24 18:26 ` Jeff King @ 2011-08-27 15:32 ` Michael Witten 1 sibling, 0 replies; 9+ messages in thread From: Michael Witten @ 2011-08-27 15:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: Avery Pennarun, Jeff King, Lawrence Brett, git On Wed, Aug 24, 2011 at 17:17, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> I don't remember all of the details of bup, but if it's possible to >> implement something similar at a lower level (i.e., at the layer of >> packfiles or object storage), then it can be a purely local thing, and >> the compatibility issues can go away. > > I tend to agree, and we might be closer than we realize. > > I suspect that people with large binary assets were scared away by rumors > they heard second-hand, based on bad experiences other people had before > any of the recent efforts made in various "large Git" topics, and they > themselves haven't tried recent versions of Git enough to be able to tell > what the remaining pain points are. I wouldn't be surprised if none of the > core Git people tried shoving huge binary assets in test repositories with > recent versions of Git---I certainly haven't. > > We used to always map the blob data as a whole for anything we do, but > these days, with changes like your abb371a (diff: don't retrieve binary > blobs for diffstat, 2011-02-19) and my recent "send large blob straight to > a new pack" and "stream large data out to the working tree without holding > everything in core while checking out" topics, I suspect that the support > for local usage of large blobs might be sufficiently better than the old > days. Git might even be usable locally without anything else, which I find > implausible, but I wouldn't be surprised if there remained only a handful > minor things remaining that we need to add to make it usable. > > People toyed around with ideas to have a separate object store > representation for large and possibly incompressible blobs (a possible > complaint being that it is pointless to send them even to its own > packfile). One possible implementation would be to add a new huge > hierarchy under $GIT_DIR/objects/, compute the object name exactly the > same way for huge blobs as we normally would (i.e. hash concatenation of > object header and then contents) to decide which subdirectory under the > "huge" hierarchy to store the data (huge/[0-9a-f]{2}/[0-9a-f]{38}/ like we > do for loose objects, or perhaps huge/[0-9a-f]{40}/ expecting that there > won't be very many). The data can be stored unmodified as a file in that > directory, with type stored in a separate file---that way, we won't have > to compress, but we just copy. You still need to hash it at least once to > come up with the object name, but that is what gives us integrity checks, > is unavoidable and is not going to change. > > The sha1_object_info() layer can learn to return the type and size from > such a representation, and you can further tweak the same places as the > "streaming checkout" and the "checkin to a pack" topics touched to support > such a representation. > > I would suspect that the local object representation is _not_ the largest > pain point; such a separate object store representation is not buying us > very much over a simpler "single large blob in a separate packfile", and > if the counter-argument is "no, decompressing still costs a lot", then the > real issue might be we decompress and look at the data when we do not have > to (i.e. issues similar to what abb371a addressed), not "decompress vs > straight copy make a bit difference". I've added Avery to the Cc list, because he really needs to chime in here. I am completely unqualified to make a comment about this, but I think that it would be silly to ignore the insights that Avery has about storing large objects; `bup' uses rolling checksums and a `bloom filter' implementation and who knows what else. ^ permalink raw reply [flat|nested] 9+ messages in thread
* One MMORPG git facts 2011-08-23 23:06 git for game development? Lawrence Brett 2011-08-23 23:32 ` Junio C Hamano 2011-08-24 1:24 ` Jeff King @ 2011-08-25 6:53 ` Marat Radchenko 2011-08-25 7:57 ` J.H. 2 siblings, 1 reply; 9+ messages in thread From: Marat Radchenko @ 2011-08-25 6:53 UTC (permalink / raw) To: git Lawrence Brett <lcbrett <at> gmail.com> writes: > > Hello, > > I am very interested in using git for game development. I will be working > with a lot of binaries (textures, 3d assets, etc.) in addition to source > files. I'd like to be able to version these files, but I understand that > big binaries aren't git's forte. Define "big". I have one MMORPG here under Git. 250k revisions, 500k files in working dir (7Gb), 200 commits daily, 250Gb Git repo, SVN upstream repo of ~1Tb. Some facts: 1. It is unusable on 32bit machine (here and there hits memory limit for a single process 2. It is unusable on Windows (because there's no 64bit msysgit) 3. git status is 3s with hot disk caches (7mins with cold) 4. History traversal means really massive I/O. 5. Current setup: 120Gb 10k rpm disk for everything but .git/objects/pack, separate 500Gb (will be upgraded to 1Tb soon) disk for packs 6. git gc is PAIN. I do it on weekends because it takes more than a day to run. Also, limits for git pack-objects should be configured VERY carefully, it can either run out of ram or take weeks to run if configured improperly. 7. With default gc settings, git wants to gc daily (but gc takes more than a day, so if you follow its desire, you're in gc loop). I set objects limit to a very high value and invoke gc manually. 8. svn users cannot sensibly do status on whole working copy (more than 10 mins) 9. svn users only update witha nightly script (40 mins) 10. git commit is several seconds because it writes 70Mb commit file. 11. It is a good idea to run git status often so that working copy info isn't evicted from OS disk caches (remember, 3s vs 7min) 12. Cloning git repo is one more pain. 100mbps network here, so fetching 250Gb takes some time. But worse, if cloning via git:// protocol, after fetching git sits for several hours in "Resolving deltas" stage. So, for initial cloning rsync is used. 13. Here and there i hit scalability issues in various git commands (which i report to maillist and most [well, all, except the one i reported this week] of which get fixed) Hope this helps to get the idea of how git behaves on a large scale. Overall, i'm happy with it and won't return to svn. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: One MMORPG git facts 2011-08-25 6:53 ` One MMORPG git facts Marat Radchenko @ 2011-08-25 7:57 ` J.H. 2011-08-25 16:02 ` Marat Radchenko 0 siblings, 1 reply; 9+ messages in thread From: J.H. @ 2011-08-25 7:57 UTC (permalink / raw) To: Marat Radchenko; +Cc: git On 08/24/2011 11:53 PM, Marat Radchenko wrote: > Lawrence Brett <lcbrett <at> gmail.com> writes: > >> >> Hello, >> >> I am very interested in using git for game development. I will be working >> with a lot of binaries (textures, 3d assets, etc.) in addition to source >> files. I'd like to be able to version these files, but I understand that >> big binaries aren't git's forte. > > Define "big". > > I have one MMORPG here under Git. 250k revisions, 500k files in working dir > (7Gb), 200 commits daily, 250Gb Git repo, SVN upstream repo of ~1Tb. Given the differences, I'm morbidly curious, which actually ends up being the more usable version control system of a project of this scale? It sounds like (from what you've said) git is generally faster, assuming it can get enough resources (which can obviously be hard at the scales your talking). - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: One MMORPG git facts 2011-08-25 7:57 ` J.H. @ 2011-08-25 16:02 ` Marat Radchenko 0 siblings, 0 replies; 9+ messages in thread From: Marat Radchenko @ 2011-08-25 16:02 UTC (permalink / raw) To: J.H.; +Cc: git On 08/25/2011 11:57:07 MSD, J.H. <warthog9@eaglescrag.net> wrote: > Given the differences, I'm morbidly curious, which actually ends up > being the more usable version control system of a project of this scale? > It sounds like (from what you've said) git is generally faster, > assuming it can get enough resources (which can obviously be hard at the > scales your talking). Hard to compare (especially because I don't have pure git environment but git-svn clone). I have give you some First, there are lots of non-geek people working on MMORPG (quest designers, modellers, text writers, map designers). Many of them find it hard to understand DVCS concepts and prefer living with linear history in a single branch (svn trunk). Their work is highly isolated from each other (for ex, maps are split in "regions" and only one person is allowed to edit one region simultanuously, only one modeller works on a particular model, each quest has a person responsible for it) so they don't hit conflicts as often as programmers do. And since svn up of whole tree takes 40 mins, they don't update during work day but have nightly script for that so the only thing they regularly use is svn commit. Second, there's TortoiseSVN that allows easy (for non-geeks) GUI history inspection. Third, we have 200 commits per day (8 work hours), that's one commit each 2.4 mins (actually, much less during lunch and much more in the morning caused by the fact that programmers are not allowed to commit after 16:00), so you copy is outdated all the time. If upstream repo was git, one would have to pull + push in those 2.4 mins, otherwise she would hit non-ff push. This could be fixed by using separate repos, though that would complicate git setup even more. On the other hand, git is really great for programmers. Heck, svn still doesn't have anything like "git log -u" (well, afaik, they finally added it in 1.7)! Stash/bisect/local commits/history rewriting/cheap branching(almost no branching happens in svn repo because that either involves either fetching of 7Gb or [if svn switch is used] 10-30mins [don't remember exactly] of massive I/O. [sorry for two-level nesting]) are very handy on daily basis. Also, git allows easy sharing of experimental changes between programmers without touching shared server. It also allows atomic commits for the whole working copy (which is important when programmer changes server, client or data at the same time). There are some decisions that were made and without which repo could be smaller (for example, client/server binaries are commited daily so that designers can use them on next day), however these decisions were made long before i joined the project and are very likely to stay this way. To sum this up: git is a wonderful (and very powerful) tool for programmers but too complex for non-tech users. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-08-27 15:33 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-23 23:06 git for game development? Lawrence Brett 2011-08-23 23:32 ` Junio C Hamano 2011-08-24 1:24 ` Jeff King 2011-08-24 17:17 ` Junio C Hamano 2011-08-24 18:26 ` Jeff King 2011-08-27 15:32 ` Michael Witten 2011-08-25 6:53 ` One MMORPG git facts Marat Radchenko 2011-08-25 7:57 ` J.H. 2011-08-25 16:02 ` Marat Radchenko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.