* Avery Pennarun's git-subtree? @ 2010-07-21 17:15 Bryan Larsen 2010-07-21 19:43 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 58+ messages in thread From: Bryan Larsen @ 2010-07-21 17:15 UTC (permalink / raw) To: git I've been using Avery Pennarun's git-subtree (http://github.com/apenwarr/git-subtree) for a while now and have been finding it very useful and problem-free. Git submodules have been particularly problematic for me on a project which contains submodules which contain submodules. git-subtree "just works", without any futzing. We've also had problems with less git savvy users dropping patches because they've occurred inside of a module. It would be really nice if git-subtree became an part of git. Avery has submitted git-subtree in the past and has indicated a willingness to do so again if there was a good chance of acceptance. Avery's announcment of v0.3 is also informative: http://kerneltrap.org/mailarchive/git/2010/2/4/22366 thank you, Bryan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 17:15 Avery Pennarun's git-subtree? Bryan Larsen @ 2010-07-21 19:43 ` Ævar Arnfjörð Bjarmason 2010-07-21 19:56 ` Avery Pennarun 0 siblings, 1 reply; 58+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-21 19:43 UTC (permalink / raw) To: Bryan Larsen; +Cc: git On Wed, Jul 21, 2010 at 17:15, Bryan Larsen <bryan.larsen@gmail.com> wrote: > I've been using Avery Pennarun's git-subtree > (http://github.com/apenwarr/git-subtree) for a while now and have been > finding it very useful and problem-free. > > Git submodules have been particularly problematic for me on a project which > contains submodules which contain submodules. git-subtree "just works", > without any futzing. > > We've also had problems with less git savvy users dropping patches because > they've occurred inside of a module. What sort of workflows do you find bad with git-submodule that are better with git-subtree? The submodule concept is simple, but a lot of the implementation is bad IMO. It doesn't integrate well, e.g. you have to remember to do git clone --recursive, or git clone and git submodule update --init after that, submodules don't remember what branch you wanted, so git submodule foreach 'git pull' doesn't DWYM (although I have a hack for that) etc. I've also wondered if we couldn't just store all the heads .gitmodules point to inside the main .git repository, and just git gc them when submodules are removed. I'd planned to maybe submit patches to fix some of these UI issues, knowing about more of them would help. I also haven't tried git-subtree. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 19:43 ` Ævar Arnfjörð Bjarmason @ 2010-07-21 19:56 ` Avery Pennarun 2010-07-21 20:36 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 58+ messages in thread From: Avery Pennarun @ 2010-07-21 19:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 3:43 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > What sort of workflows do you find bad with git-submodule that are > better with git-subtree? > > The submodule concept is simple, but a lot of the implementation is > bad IMO. It doesn't integrate well, e.g. you have to remember to do > git clone --recursive, or git clone and git submodule update --init > after that, submodules don't remember what branch you wanted, so git > submodule foreach 'git pull' doesn't DWYM (although I have a hack for > that) etc. In my experience, there is exactly one killer problem with submodules that people are looking to solve with git-subtree: Branching. If you have a random developer in your office and they need to make a patch to one of your subprojects in the course of making their main project work, with submodules this requires incredibly error-prone contortions involving branching both projects, making sure you have push access to both projects, learning how to use git-submodule, etc. And then merging that branch into someone else's branch is complicated, particularly if they've also applied their own changes to the subproject. With git-subtree, that developer just commits the changes to the merged project - and that's it. Then you or someone else, who knows how git-subtree works, at any point in the future, can submit the subproject changes upstream, or not, as appropriate. No amount of bugfixing in git submodule can fix this workflow, because it's not a result of bugs. (The bugs, particularly the disconnected-by-default HEADs on submodule checkouts, do make it a bit worse :( ) It would require a fundamental redesign to make this work nicely with submodules. git-subtree is certainly a fundamental redesign. Arguably there might be even better ways to design it, of course. And submodules are good for certain other situations that git-subtree isn't, so it's obviously not a one-for-one replacement. If we can get some kind of consensus in principle that git-subtree is a good idea to merge into git core, I can prepare some patches and we can talk about the details. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 19:56 ` Avery Pennarun @ 2010-07-21 20:36 ` Ævar Arnfjörð Bjarmason 2010-07-21 21:09 ` Avery Pennarun 0 siblings, 1 reply; 58+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-21 20:36 UTC (permalink / raw) To: Avery Pennarun; +Cc: Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 19:56, Avery Pennarun <apenwarr@gmail.com> wrote: > No amount of bugfixing in git submodule can fix this workflow, because > it's not a result of bugs. (The bugs, particularly the > disconnected-by-default HEADs on submodule checkouts, do make it a bit > worse :( ) It would require a fundamental redesign to make this work > nicely with submodules. I think most of those can be fixed, actually. The only requirement that the git plumbing imposes on git-submodules is that a "commit" entry exist in your tree, the rest is just (ugly plumbing). Thus, we could: * Hack git-submodule (or its replacement) to check import the tree that contains that "commit" into one central .git * Fix git status / git commit so that you could commit into submodules, i.e.: for each submodule in this-commit: chdir $submodule && commit done && cd $root && commit -m"bumping sumbodules" * Make git-push push the submodule contents and the superprojects. You'd just need to have commit access to the url listed in .gitmodules. What's missing from that (which would be nice) is the ability to check out a subdirectory from another repository. That could (I think) be done by just adding a normal "tree" entry, and then specifying that that tree can be found in git://... instead of the main tree. > If we can get some kind of consensus in principle that git-subtree is > a good idea to merge into git core, I can prepare some patches and we > can talk about the details. From having looked at it briefly it looks very nice. But it looks to me as if the main differences between git-submodule and git-subtree are in the porcelain, not the plumbing. It would be a lot less confusing to users of Git in the long term if we would at least try to unify these two approaches instead of having two mutually incompatible ways of doing essentially the same thing. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 20:36 ` Ævar Arnfjörð Bjarmason @ 2010-07-21 21:09 ` Avery Pennarun 2010-07-21 21:20 ` Avery Pennarun ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-21 21:09 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 4:36 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Wed, Jul 21, 2010 at 19:56, Avery Pennarun <apenwarr@gmail.com> wrote: >> No amount of bugfixing in git submodule can fix this workflow, because >> it's not a result of bugs. (The bugs, particularly the >> disconnected-by-default HEADs on submodule checkouts, do make it a bit >> worse :( ) It would require a fundamental redesign to make this work >> nicely with submodules. > [...] > I think most of those can be fixed, actually. The only requirement > that the git plumbing imposes on git-submodules is that a "commit" > entry exist in your tree, the rest is just (ugly plumbing). Sure. But this commit object (and the objects it points to) are never automatically pushed, fetched, or fsck'd. They're second class citizens. As it turns out, this was a major design mistake in implementing the submodule commit objects. All the behaviour people *currently* get from submodules could have been obtained without using a new 'commit' object type at all. Just add a commitid to the horrible junk (including repo URLs, argh) that already needs to get pasted into .gitmodules, and have git-commit at the top level update .gitmodules automatically (as it currently updates the 'commit' tree entries). Problem solved (at least, solved to exactly the extent that it is today). What we *really* want is a way to have git actually recurse through commit objects when doing *any* operation, as if they were tree objects. If we had that, submodules could be beautiful (because you'd push them to the same repo, etc and users would see none of the complexity). But this doesn't exist. And for backward compatibility at this point, we'd probably need to introduce an entirely new kind of tree entry to support such a thing. > Thus, we could: > > * Hack git-submodule (or its replacement) to check import the tree > that contains that "commit" into one central .git This part is relatively easy, I think - at least in concept, although I bet there would be widespread implementation tweaks - and would clean up a lot of the mess. However it would require a change to the .git/index file format to remember when a subdir is a commit and not a "normal" tree so that it doesn't silently commit the next thing as a tree instead. > * Fix git status / git commit so that you could commit into > submodules, i.e.: > > for each submodule in this-commit: > chdir $submodule && commit > done && cd $root && commit -m"bumping submodules" After making the earlier change to get rid of the extra .git subdirs, this next requirement would actually be considerably more work, because 'git commit' would need to know how to update a subcommit without changing HEAD. You certainly couldn't just code it up as a recursive "git commit" as you imply (and as you could do right now). > * Make git-push push the submodule contents and the > superprojects. You'd just need to have commit access to the url > listed in .gitmodules. This is really a *killer* problem, and you're making it sound easy. Let's imagine that my app has 25 different submodules - not unreasonable at all in a world with dozens of ever-changing ruby gems and suchlike. Now, if I want to branch my project, I might have to branch 25 projects just so I can push my changes? It's totally awful. And the awfulness is multiplied many times over if .gitmodules has hard-coded repo paths, because then I have to update the repo path in my branch but not the other branch, and merging will have conflicts. You might think that my .git/config could just override .gitmodules, but then some guy trying to fetch my branch will fail to fetch the submodules from my branch and get errors and have no idea what's going on. And you might think that using relative repo paths in .gitmodules would work, but that's only if I branched all 25 submodules in the *first* place. In real life, most subprojects point at the original project's home repo by default (because nobody thinks they'll be patching 25 subprojects when they start, and they're probably right), but then you have to individually change the URLs when you decide you need to patch them, and life gets complicated and ugly, especially when the next guy goes to fork your project and now needs to fork some subprojects but not others. There is no good solution to the submodule problem if each submodule has to go in its own repo. I've been thinking about this for years now, and watching lots of discussions about it on the git mailing list, and I just can't see any other option. All the submodules have to get pushed to and fetched from the same repo by default. Anything else is insane. One option might be to store the submodule commit refs as refs in your superproject. That wouldn't actually be so bad, except for the aforementioned problem that fetch/push/clone/etc don't actually trace through commit objects when deciding what objects to send you, so fetching the ref of the superproject wouldn't autofetch the subproject refs. Also, you could accidentally delete one of the subproject refs and lose tons of history without ever realizing it. That's error prone and confusing... and clutters up your repo refs list with administrative stuff you didn't actually want in the first place. > What's missing from that (which would be nice) is the ability to check > out a subdirectory from another repository. That could (I think) be > done by just adding a normal "tree" entry, and then specifying that > that tree can be found in git://... instead of the main tree. Actually that's already easy with submodules (and git-subtree makes it easy too, though slightly different). Just fetch the commit from the other repo, and do: git checkout FETCH_HEAD -- subdirname >> If we can get some kind of consensus in principle that git-subtree is >> a good idea to merge into git core, I can prepare some patches and we >> can talk about the details. > > From having looked at it briefly it looks very nice. But it looks to > me as if the main differences between git-submodule and git-subtree > are in the porcelain, not the plumbing. No. The fundamental difference is exactly one: git-subtree uses normal 'tree' entries (rather than commits) in its trees, so that all the git tools recurse through them like any other tree. Thus you don't need any extra refs, extra .git dirs, etc. That allows you to bypass all the useless behaviour git has around 'commit' entries. This is very much a plumbing difference. The git-submodule porcelain happens to independently be kind of annoying and inconvenient, but that would be much easier to fix if it weren't for the plumbing-related problems. > It would be a lot less confusing to users of Git in the long term if > we would at least try to unify these two approaches instead of having > two mutually incompatible ways of doing essentially the same thing. True. But I don't have the time, and implementing the new 'commit' entry semantics sounds like a lot of work (as opposed to arguing about them, which I guess I'm good at but which seems unproductive). In productive terms: git-subtree is solving problems for real users right now. It might solve more problems for more users if it were integrated into the core and thus made "official." Nothing precludes making submodules better later. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 21:09 ` Avery Pennarun @ 2010-07-21 21:20 ` Avery Pennarun 2010-07-21 22:46 ` Jens Lehmann 2010-07-21 23:46 ` Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-21 21:20 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 5:09 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > All the submodules have > to get pushed to and fetched from the same repo by default. Anything > else is insane. ...and just to clarify, by far the least insane option here is to have the whole thing all under a single ref, which is currently impossible with submodules. >> What's missing from that (which would be nice) is the ability to check >> out a subdirectory from another repository. That could (I think) be >> done by just adding a normal "tree" entry, and then specifying that >> that tree can be found in git://... instead of the main tree. > > Actually that's already easy with submodules (and git-subtree makes it > easy too, though slightly different). Just fetch the commit from the > other repo, and do: > > git checkout FETCH_HEAD -- subdirname Sorry, that's not right. You can use this instead for roughly the effect you want: git read-tree --prefix subdirname FETCH_HEAD: && git checkout subdirname Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 21:09 ` Avery Pennarun 2010-07-21 21:20 ` Avery Pennarun @ 2010-07-21 22:46 ` Jens Lehmann 2010-07-22 1:09 ` Avery Pennarun 2010-07-21 23:46 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 58+ messages in thread From: Jens Lehmann @ 2010-07-21 22:46 UTC (permalink / raw) To: Avery Pennarun Cc: Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano Am 21.07.2010 23:09, schrieb Avery Pennarun: > What we *really* want is a way to have git actually recurse through > commit objects when doing *any* operation, as if they were tree > objects. This would not be useful for every work flow (or to put it in other words: this is not what I *really* want ;-). And as you pointed out, that only works when you have a single repo you are working against (like you do in your subtree model). But unless I got something wrong (which might very well be the case, as I never have used subtree myself), all changes to the subtree will only show up in that single repo, unless you actively push them somewhere else. And that, it seems to me, is as easy to forget as you can right now forget to push a submodules commit you already recorded and pushed in the superproject). So am I wrong assuming that subtree is more focused on a single repo containing all commits which /might/ then be shared, while submodules are about /always/ sharing code via their own repo? > There is no good solution to the submodule problem if each submodule > has to go in its own repo. I've been thinking about this for years > now, and watching lots of discussions about it on the git mailing > list, and I just can't see any other option. All the submodules have > to get pushed to and fetched from the same repo by default. Anything > else is insane. I have to object here. Your insanity is someone else's work flow ;-) And I am the last one not to admit that there are some severe usability warts still to be fixed for submodules (I put up a - not necessarily complete - list at http://wiki.github.com/jlehmann/git-submod-enhancements/ ). And myself and others are actively working on them (the next bigger thing after a new config option about when to consider a submodule modified are recursive checkouts, so that "git submodule update" will hopefully be almost obsolete in the near future). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 22:46 ` Jens Lehmann @ 2010-07-22 1:09 ` Avery Pennarun [not found] ` <m31vavn8la.fsf@localhost.localdomain> 0 siblings, 1 reply; 58+ messages in thread From: Avery Pennarun @ 2010-07-22 1:09 UTC (permalink / raw) To: Jens Lehmann Cc: Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 6:46 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote: > Am 21.07.2010 23:09, schrieb Avery Pennarun: >> What we *really* want is a way to have git actually recurse through >> commit objects when doing *any* operation, as if they were tree >> objects. > > This would not be useful for every work flow (or to put it in other > words: this is not what I *really* want ;-). And as you pointed > out, that only works when you have a single repo you are working > against (like you do in your subtree model). But you see, the utter failure of the way git-submodule works is that it required a change to the git repository format, but that repository format change resulted in absolutely *zero* improvement. The tree object of the parent points at 'commit xxxx'. But everything in git has been *specially modified* to *just ignore* that 'commit xxxx'. It would have given exactly the same functionality - and much less confusingly - if .gitmodules would just include the desired commitid of the child project. You could still have the same 'git submodule' command with the same syntax and semantics. And it wouldn't have bastardized the git repo format. It would have been just as good to just dump something into your Makefile to go 'git clone' the subprojects from somewhere before building. Seriously, it would be one or two lines of code; all of git-submodule replaces about one or two lines of code in your Makefile. And you know what? If I just used that one or two lines of code, I'd have all sorts of flexibility in where the subprojects get cloned from, which I currently don't have, and which is the insanity that drove me to write git-subtree in the first place. HOWEVER I'm not saying we can change that now. I'm not suggesting that this feature can be safely removed or changed at all. Furthermore, I totally agree that having large subprojects *not* be in your repo is sometimes a good idea. I just think it was actually a bad idea to intrusively add support to git to implement this when it could have been done without modifying git at all. I also believe that the vast majority of people who use git-submodules would rather have it work differently. (Again, this is not to subtract functionality. The existing functionality is useful sometimes.) > But unless I got something wrong (which might very well be the > case, as I never have used subtree myself), all changes to the > subtree will only show up in that single repo, unless you actively > push them somewhere else. And that, it seems to me, is as easy to > forget as you can right now forget to push a submodules commit you > already recorded and pushed in the superproject). So am I wrong > assuming that subtree is more focused on a single repo containing > all commits which /might/ then be shared, while submodules are > about /always/ sharing code via their own repo? Yes, this is absolutely intentional. It also matches exactly with everything else in the git repo philosophy! I make my own clone. I mess with it, I fiddle with it, I make 17 clones on my local machine, I throw away what I don't like, I pull merge, I rebase, and then *eventually* I submit *some* of my patches upstream. git-subtree lets you do all those things. git-submodule stomps on you repeatedly if you try. To wit: - cloning a local supermodule on my local machine to another copy: every call to 'git submodule update' re-downloads submodule repos from the remote machine, because the submodule path is hardcoded to point at a remote machine. Better still, if I've modified any of my subprojects without pushing changes upstream, the clone will fail, because the new copy of the superproject will have no access to my subproject's patches. (If .gitmodules supplies a relative path, it's even worse, because my 'origin' in the new copy is now pointing to a local folder, not a remote one, and all the submodules don't exist there.) - branching a local supermodule on my local machine: fails to branch the submodule automatically and makes it super easy to lose patches altogether (since by default, they're committed to a detached HEAD). - pulling/merging: always causes a conflict if local and remote have modified the same submodule. - rebasing: always causes a conflict if local and remote have modified the same submodule. Also requires you to rebase submodules separately from the supermodule. (Yes, this happens often in real life.) - submitting upstream: requires me to have a separate repo that's a copy of the upstream repo, and to manage at least one subrepo branch for every superproject branch, just to track my submissions. With git-subtree, no extra repos are necessary. It's very clear that git-submodule's current behaviour totally mismatches the entire git philosophy. That's why it's so impossible to make the git-submodule command usable. Another mental exercise: try to think of any other part of git where it would be considered remotely acceptable to put the absolute or relative URL of one repo inside another repo. git URLs are an implementation detail of clone/fetch/push/pull. The *content* that git manages should not have to deal with that stuff. With git-submodule, it has to. With git-subtree, it doesn't. >> There is no good solution to the submodule problem if each submodule >> has to go in its own repo. I've been thinking about this for years >> now, and watching lots of discussions about it on the git mailing >> list, and I just can't see any other option. All the submodules have >> to get pushed to and fetched from the same repo by default. Anything >> else is insane. > > I have to object here. Your insanity is someone else's work flow ;-) Sorry. I was being a little hyperbolic. Some people might want to do use multiple repos for certain things - but I believe those people are much more rare than the kind who want to do it my way. And furthermore, even those people would probably actually like it better if *most* of their subprojects - the smallish ones - could be all in one repo. Even if you like multiple repos, I'm sure you don't like being *forced* to manually fork multiple repos just to fork a single superproject. I'm sure you don't like updating .gitmodules to change the absolute URL of a submodule, and then getting merge conflicts when someone else had to do the same thing. There's no way you like that. If you like that, then you really are insane. :) > And I am the last one not to admit that there are some severe > usability warts still to be fixed for submodules (I put up a - not > necessarily complete - list at > http://wiki.github.com/jlehmann/git-submod-enhancements/ ). And > myself and others are actively working on them (the next bigger > thing after a new config option about when to consider a submodule > modified are recursive checkouts, so that "git submodule update" > will hopefully be almost obsolete in the near future). I don't believe you can fix git-submodule by fixing surface warts. It's fundamentally broken. Since we're stuck with supporting the current behaviour at the end of time, fixing the surface warts might be necessary and even mildly helpful. It will also be soul sucking since no matter how hard you try, people will still hate the result. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <m31vavn8la.fsf@localhost.localdomain>]
* Re: Avery Pennarun's git-subtree? [not found] ` <m31vavn8la.fsf@localhost.localdomain> @ 2010-07-22 18:23 ` Bryan Larsen 2010-07-24 22:36 ` Jakub Narebski 2010-07-22 19:41 ` Avery Pennarun 1 sibling, 1 reply; 58+ messages in thread From: Bryan Larsen @ 2010-07-22 18:23 UTC (permalink / raw) To: Jakub Narebski Cc: Avery Pennarun, Jens Lehmann, =?iso-8859-15?q? Ævar Arnfjörð Bjarmason?=, git, Junio C Hamano, Linus Torvalds > > Using git-subtree has its warts too: I don't think for example that there is > a way to get a log _automatically excluding_ history subtree-merged > subprojects. Or is it there? > It works exactly right for me when I used git-subtree in "squashed" mode. Changes which were done in tree show up separately in the log, changes which were pulled in via git-subtree pull show up as a single summary entry in the log. This discussion has been about how to improve git submodules, which is sorely needed. However, it's quite clear that git submodules will never work as well as git subtrees in certain quite common situations. If fixed, git submodules will be more appropriate in other situations. However, I'm not asking to remove git submodules or prevent anybody from fixing them, I'm just asking that git subtree be merged. Does anybody actually oppose the merger of git-subtree, which has (at least) hundreds of users despite its out-of-tree status? thanks, Bryan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 18:23 ` Bryan Larsen @ 2010-07-24 22:36 ` Jakub Narebski 0 siblings, 0 replies; 58+ messages in thread From: Jakub Narebski @ 2010-07-24 22:36 UTC (permalink / raw) To: Bryan Larsen Cc: Avery Pennarun, Jens Lehmann, git, Junio C Hamano, Linus Torvalds, Ævar Arnfjörð Bjarmason Dnia czwartek 22. lipca 2010 20:23, Bryan Larsen napisał: > > > > Using git-subtree has its warts too: I don't think for example that there is > > a way to get a log _automatically excluding_ history subtree-merged > > subprojects. Or is it there? > > > > It works exactly right for me when I used git-subtree in "squashed" > mode. Changes which were done in tree show up separately in the log, > changes which were pulled in via git-subtree pull show up as a single > summary entry in the log. > > This discussion has been about how to improve git submodules, which is > sorely needed. However, it's quite clear that git submodules will > never work as well as git subtrees in certain quite common situations. > If fixed, git submodules will be more appropriate in other situations. > However, I'm not asking to remove git submodules or prevent anybody > from fixing them, I'm just asking that git subtree be merged. > > Does anybody actually oppose the merger of git-subtree, which has (at > least) hundreds of users despite its out-of-tree status? I am very much *for* merging git-subtree into git core. It is not that much different from e.g. "git submodule" or "git remote" porcelain commands. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? [not found] ` <m31vavn8la.fsf@localhost.localdomain> 2010-07-22 18:23 ` Bryan Larsen @ 2010-07-22 19:41 ` Avery Pennarun 2010-07-22 19:56 ` Jonathan Nieder ` (3 more replies) 1 sibling, 4 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-22 19:41 UTC (permalink / raw) To: Jakub Narebski Cc: Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 5:57 AM, Jakub Narebski <jnareb@gmail.com> wrote: > Avery Pennarun <apenwarr@gmail.com> writes: >> The tree object of the parent points at 'commit xxxx'. But everything >> in git has been *specially modified* to *just ignore* that 'commit >> xxxx'. It would have given exactly the same functionality - and much >> less confusingly - if .gitmodules would just include the desired >> commitid of the child project. You could still have the same 'git >> submodule' command with the same syntax and semantics. And it >> wouldn't have bastardized the git repo format. > > Actually the prototype implementation by Martin Waitz worked in such way, > i.e. with special file in top directory holding SHA-1 of submodule commits, > what you can read on https://git.wiki.kernel.org/index.php/SubprojectSupport > page. > > The low level plumbing with 'commit' entries in the 'tree' object was > created by Linus Torvalds (CC-ed). I don't remember discussion about why > this solution was chosen, though. But please read about differences between > git-subtree and git-submodule below. I actually think Linus's contribution - the particular change to the repo format to have trees link to commits - was exactly right. If we want to talk about failings of git-subtree, they all precisely come down to the fact that, because it has tree->tree links instead of tree->commit links, it has to stash commitid information in the commit message, which is gross and error prone. git-subtree would have benefitted from tree->commit links, but because git's implementation of them is broken, that wasn't an option. Unfortunately everything built *on top of* Linus's file format contribution has turned out to be a disaster. Actually making the subprojects have their own local .git repositories was a disaster, for exactly the same reasons that having every subdir in svn have its own .svn directory (or in cvs, every directory has its own CVS directory) is a disaster. When you split things up that way, you can't easily do global atomic operations across the entire set of content. And you can accidentally have a subdir pointing at a totally different place than the parent thinks it is. And you have CVS/.svn/.git directories cluttering stuff up everywhere. The tree->commit links do not preclude you doing wonderful global atomic operations across the entire set of content. The separate repository garbage absolutely does. >> To wit: >> >> - cloning a local supermodule on my local machine to another copy: >> every call to 'git submodule update' re-downloads submodule repos from >> the remote machine, because the submodule path is hardcoded to point >> at a remote machine. > > Errrr... the URL to submodule repository (I guess it is what you meant here > by "submodule path") in the config file overrides URL to submodule > repository in '.gitmodules' for a reason. So the plumbing support is here, > it is only failing of an UI that we don't have '--recursive-local' or > '--convert-submodules' (like '--convert-links' in wget) in "git clone". Let me be more specific. I create an app named myapp on github: git://github.com/apenwarr/myapp It uses 17 different ruby gems, which I import as subprojects. I have two choices: [1] .gitmodules can use absolute paths to the original gem locations: git://github.com/rubygems/gem[1..n] [2] Or else I can fork them all and use relative paths in .gitmodules: ../gem[1..n] translates to --> git://github.com/apenwarr/gem[1..n] At this phase, both options are okay (though option #2 is obviously much more work). My next step will be to clone myapp onto my local machine: git clone --recursive git://github.com/apenwarr/myapp And it will grab all the submodules just fine. Now let's say I want to change gem13. If I used option #1, I have to now go fork gem13 on github. Then do one of the following: [1A] Re-point my .git/config file to point at the new submodule location, git://github.com/apenwarr/gem13 but leave .gitmodules alone [1B] or update both .git/config and .gitmodules If I do #1A, then when I push my changes, the *next* guy who clones git://github.com/apenwarr/myapp will fail; the gem13 link in myapp points at a commit that is *only* in apenwarr/gem13, not rubygems/gem13. If I do #1B, then if someone else does something similar in their own copy and pulls from me, we will have a conflict in .gitmodules. In both cases, if two people need to patch gem13 during their changes to myapp, merges will fail because there is no submodule-recursive merge (and trying to write one would be incredibly hard since it would have to communicate across sub-repositories). So if you do #1, then I don't know of any options other than #1A and #1B, and neither one works. Now, if I had done #2 instead, things are a little better, because we're using relative paths in .gitmodules so when the second guy clones a copy of myapp, he can also clone a copy of all 17 gems, and all the paths will still work. When the second guy does 'git pull apenwarr myapp' it will still fail, though; it will try to get the latest gem13 from ../gem13 --> secondguy/gem13, when actually the required commits are in apenwarr/gem13. Furthermore, 'git clone --recursive myapp myapp2' will totally fail, because it will then expect gem[1..n] to all be in separate local directories at the same level as myapp, which they aren't. (You might be saying: what do you need that for? Well, I rarely do. But sometimes. And as long as I don't use git-submodule, it works fine.) You can fix warts all day long. You can't make it work, because it's not just warts; the insides are rotten. >> - branching a local supermodule on my local machine: fails to branch >> the submodule automatically and makes it super easy to lose patches >> altogether (since by default, they're committed to a detached HEAD). > > That's UI problem, too. Theough I guess that using detached HEAD was > choosen because it is simplest solution. I've seen the discussion about submodule branch names go by on the git list a few times, and I participated once or twice. The current option was certainly chosen because it's the simplest; unfortunately, it's also non-functional, and all the other options are also awful. Here it is in a nutshell: if I'm branching myapp, I already have a branch that I want to store all my changes under; it's the branch I'm working on in myapp. That's not to say I want that same myapp branch name *in my gem13 repository*; my branchname is probably something like add-feature-to-myapp, which has nothing to do with gem13. The changes required to gem13 to implement add-feature-to-myapp are probably just a tiny bugfix or config option. gem13 doesn't know anything about myapp. The upstream gem13 maintainers certainly don't care about myapp. As a guy who *just wants to get work done on myapp right now*, thinking about what to name my trivial one-patch temporary branch gem13 is a *waste of time*. I don't *want* my gem13 changes to have a branchname. So the disconnected HEAD is the right answer then, right? No! The default disconnected HEAD makes it *far* too easy to lose my changes. I don't want to name my branch, but I *have* to, because I *have* to push it somewhere separately, because if I don't, then my changes to myapp will be useless to everyone who tries to pull from me. The question of what to name the submodule branch is unanswerable because it's the wrong question. > Otherwise you would have either > put submodule branch name in '.gitmodules' (but that's contrary to git > philosophy that branches are ephemeral and branch names are local matter), Surely including *repository URLs* inside the *repository content* is at least as bad as including branch names. If we're going to do one, we might as well do the other. But it won't help, because the stored branch name will probably be 'master', and my personal hacked-up copy of gem13 shouldn't be on a branch named master anyway. >> - pulling/merging: always causes a conflict if local and remote have >> modified the same submodule. >> >> - rebasing: always causes a conflict if local and remote have modified >> the same submodule. Also requires you to rebase submodules separately >> from the supermodule. (Yes, this happens often in real life.) > > That's a matter of UI, and lack of merge strategy that can merge > submodules... although if I remember correctly there was some preliminary or > proof of concept work on submodule-aware merge strategy. > > "git merge" and "git rebase" would have to acquire '--recursive' option. > Currently you probably need to use 'git submodule foreach ...', I guess. Merge and rebase are actually very different here. Merging is something I might expect to work across submodules eventually; rebasing is much less obvious, because successive versions of myapp might actually be jumping back and forth between versions of gem13. Then what does it mean to auto-rebase gem13 when you're rebasing myapp? You should check out git-subtree --squash here; it's quite interesting and makes rebasing easy, even if the subtree version is alternating back and forth. I'm not sure how you'd map it onto git-submodule, though, even if git-submodule weren't broken. >> - submitting upstream: requires me to have a separate repo that's a >> copy of the upstream repo, and to manage at least one subrepo branch >> for every superproject branch, just to track my submissions. With >> git-subtree, no extra repos are necessary. > > NOTE that it is important design decision to have by default separate object > storage for submodules. I certainly won't deny that :) This discussion is about whether it was the right decision. > First, this allow to not clone submodule, and do not download its objects. > This is *impossible* with git-subtree (with using 'subtree' merge strategy). > I'm not sure how commonly this feature is used in real life, but somebody > here in this thread gave example of submodule with arts, which is large > because it contains large / many binary files, while being required to have > only for some. > > Second, from what I remember this was implemented also for perfomance > reasons... though I don't remember reasoning used. I think this ended up being a terrible mistake. The problems you identify come down to this: 1) Sometimes I want to clone only some subdirs of a project 2) Sometimes I don't want the entire history because it's too big. 3) Super huge git repositories start to degrade in performance. (Actually #3 isn't really a problem as far as I've ever seen, and bup stores hundreds of gigs, including trees that reference millions of blobs, in a single git repo without dying. But okay, maybe this is a problem sometimes for some types of operations.) These problems come up regardless of whether you're using submodules. The hard truth of the matter is that people are using submodules to try to solve these problems, but they were never caused by the lack of submodules in the first place. When I clone the Linux kernel, sometimes I just don't want the entire history. That's why people invented shallow clones (although last time I checked, they were still a little half-assed). When I clone KDE, sometimes I don't want all the subprograms; sometimes I do. That's why people invented sparse checkouts, and why (I think) it would be nice to have sparse clones as well (where you don't even download the objects for subtrees you don't care about). There is simply not a clear path from "my repo is too big" to "all my problems will be solved if git-submodule is implemented correctly." The truth is, problems 1-3 are easily solvable by improving the git implementation, without any change in architecture and without requiring people to layout their projects differently. The *real* need for submodules - the need you can't fix without submodules - has nothing to do with these requirements. It's about each submodule wanting to have its own lifecycle, owner, changelog, and release process, and - perhaps this is actually the killer requirement - each supermodule wanting to be able to cleanly rewind a submodule if they don't like the new version. >> It's very clear that git-submodule's current behaviour totally >> mismatches the entire git philosophy. That's why it's so impossible >> to make the git-submodule command usable. > > That's very strong accusation. Agreed... but that doesn't make it wrong :) > Using git-subtree has its warts too: I don't think for example that there is > a way to get a log _automatically excluding_ history subtree-merged > subprojects. Or is it there? There's git-subtree merge --squash. It's pretty cool. Also insane and not as good as real tree->commit links. I will gladly admit to git-subtree's warts. > > Sumodule | Subtree > -----------------------------------+---------------------------------- > must clone recursively submodules; | automatically gets all subtrees Yup. > can not clone some submodules | cannot leave out some subtree, but > | nowadays can not checkout it I don't understand what you mean on the right-hand side here. FWIW, subtree forces you to always checkout the entire thing (unless you use git sparse checkouts, I guess; maybe that's what you mean). > rebase and merge needs separate | rebase and merge works normally > work in submodule currently | True. > easy to send updates upstream | need not to worry about submodule > to submodule repo | repository It's actually easy to send subtree updates upstream with the new 'git subtree push' command, which was contributed recently. Or you can send them via format-patch if you use 'git subtree split'. It's one line more of typing than doing it on a submodule repo, and that one line is greatly offset by the hugely reduced typing by not using submodules. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:41 ` Avery Pennarun @ 2010-07-22 19:56 ` Jonathan Nieder 2010-07-22 20:06 ` Avery Pennarun ` (2 more replies) 2010-07-23 8:31 ` Chris Webb ` (2 subsequent siblings) 3 siblings, 3 replies; 58+ messages in thread From: Jonathan Nieder @ 2010-07-22 19:56 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Avery Pennarun wrote: > Unfortunately everything built *on top of* Linus's file format > contribution has turned out to be a disaster. Aside: this kind of statement might make it unlikely for exactly those who would benefit most from your opinions to read them. Well, that is my guess, anyway. I know that I have not found the time to read your email (though I would like to) because I suspect based on such sweeping statements that it would take a while to separate the useful part from the rest. Of course I am glad to see people thinking about these issues. My comment is only about how the results get presented. Jonathan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:56 ` Jonathan Nieder @ 2010-07-22 20:06 ` Avery Pennarun 2010-07-22 20:17 ` Ævar Arnfjörð Bjarmason 2010-07-22 20:43 ` Elijah Newren 2 siblings, 0 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-22 20:06 UTC (permalink / raw) To: Jonathan Nieder Cc: Jakub Narebski, Jens Lehmann, Ævar Arnfjörð, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 3:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Avery Pennarun wrote: >> Unfortunately everything built *on top of* Linus's file format >> contribution has turned out to be a disaster. > > Aside: this kind of statement might make it unlikely for exactly > those who would benefit most from your opinions to read them. > > Well, that is my guess, anyway. I know that I have not found the time > to read your email (though I would like to) because I suspect based on > such sweeping statements that it would take a while to separate the > useful part from the rest. Unfortunately you will find that the rest of my email more or less just expands in detail on those sweeping statements. Sorry. Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:56 ` Jonathan Nieder 2010-07-22 20:06 ` Avery Pennarun @ 2010-07-22 20:17 ` Ævar Arnfjörð Bjarmason 2010-07-22 21:33 ` Avery Pennarun 2010-07-22 20:43 ` Elijah Newren 2 siblings, 1 reply; 58+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-22 20:17 UTC (permalink / raw) To: Jonathan Nieder Cc: Avery Pennarun, Jakub Narebski, Jens Lehmann, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 19:56, Jonathan Nieder <jrnieder@gmail.com> wrote: > Avery Pennarun wrote: > >> Unfortunately everything built *on top of* Linus's file format >> contribution has turned out to be a disaster. > > Aside: this kind of statement might make it unlikely for exactly > those who would benefit most from your opinions to read them. > > Well, that is my guess, anyway. I know that I have not found the time > to read your email (though I would like to) because I suspect based on > such sweeping statements that it would take a while to separate the > useful part from the rest. > > Of course I am glad to see people thinking about these issues. > My comment is only about how the results get presented. Well, it's not like Linus is the image of calmness when attacking something he perceives as crap design either >:) Anyway, to answer Bryan's question. My comments in previous messages shouldn't be interpreted as opposition to git-subtree being merged at all. It's clearly very useful, especially for cases where git-submodule is wanting. I'd be happy to review a patch that integrated it into the Git tree. But it's also clear that we have a lot of tribal knowledge about the lackings of git submodule / git subtree. It would be *really* useful if people like Avery and Jens which have obviously thought hard about the submodule/subtree issues would draft up some (calmly written) docs about how the two differ (with comparison tables etc.). That'd be a very helpful resource for Git users in deciding which one to use. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 20:17 ` Ævar Arnfjörð Bjarmason @ 2010-07-22 21:33 ` Avery Pennarun 2010-07-23 15:10 ` Jens Lehmann 2010-07-26 17:34 ` Eugene Sajine 0 siblings, 2 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-22 21:33 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jonathan Nieder, Jakub Narebski, Jens Lehmann, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 4:17 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > But it's also clear that we have a lot of tribal knowledge about the > lackings of git submodule / git subtree. It would be *really* useful > if people like Avery and Jens which have obviously thought hard about > the submodule/subtree issues would draft up some (calmly written) docs > about how the two differ (with comparison tables etc.). > > That'd be a very helpful resource for Git users in deciding which one > to use. I think I'm too biased to write that, but if someone else wants to take the lead, I could certainly contribute. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 21:33 ` Avery Pennarun @ 2010-07-23 15:10 ` Jens Lehmann 2010-07-26 17:34 ` Eugene Sajine 1 sibling, 0 replies; 58+ messages in thread From: Jens Lehmann @ 2010-07-23 15:10 UTC (permalink / raw) To: Avery Pennarun Cc: Ævar Arnfjörð Bjarmason, Jonathan Nieder, Jakub Narebski, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Am 22.07.2010 23:33, schrieb Avery Pennarun: > On Thu, Jul 22, 2010 at 4:17 PM, Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> But it's also clear that we have a lot of tribal knowledge about the >> lackings of git submodule / git subtree. It would be *really* useful >> if people like Avery and Jens which have obviously thought hard about >> the submodule/subtree issues would draft up some (calmly written) docs >> about how the two differ (with comparison tables etc.). >> >> That'd be a very helpful resource for Git users in deciding which one >> to use. > > I think I'm too biased to write that, but if someone else wants to > take the lead, I could certainly contribute. While I don't consider myself biased, I just don't know enough about the details of the subtree approach to write that. But I would certainly contribute to the submodule side of such a document too. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 21:33 ` Avery Pennarun 2010-07-23 15:10 ` Jens Lehmann @ 2010-07-26 17:34 ` Eugene Sajine 1 sibling, 0 replies; 58+ messages in thread From: Eugene Sajine @ 2010-07-26 17:34 UTC (permalink / raw) To: Avery Pennarun Cc: Ævar Arnfjörð Bjarmason, Jonathan Nieder, Jakub Narebski, Jens Lehmann, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 5:33 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > On Thu, Jul 22, 2010 at 4:17 PM, Ęvar Arnfjörš Bjarmason > <avarab@gmail.com> wrote: >> But it's also clear that we have a lot of tribal knowledge about the >> lackings of git submodule / git subtree. It would be *really* useful >> if people like Avery and Jens which have obviously thought hard about >> the submodule/subtree issues would draft up some (calmly written) docs >> about how the two differ (with comparison tables etc.). >> >> That'd be a very helpful resource for Git users in deciding which one >> to use. > > I think I'm too biased to write that, but if someone else wants to > take the lead, I could certainly contribute. > > Have fun, > > Avery I personally tried to understand submodules, but my attempts to find easy way to use them have failed miserably;) probably i have to spend even more time in order to understand if i can benefit from them or not. So, i think this kind of comparison would be very beneficial for "mere mortals" I would like to share an idea how it can be organized: We could create a file in doc section of git.git or in Avery's repo named git_submodule_vs_git_subtree or just use a separate topic of the list. The file would look like this: git-submodule | feature | git-subtree ______________________________________________________________________ + | ability to tag submodule without | - (comments) | tagging the whole tree | (comments) ______________________________________________________________________ Avery and Jens could add features they think are beneficial for one project or another and answer to each other this way. They could mark just presence or abscence of the feature by +/- like above or specify key approaches how to do different things. For example, how to configure new submodule (main sequence of commands to create, add ), how to do that with sub-tree... I think this simple feature matrix will answer a lot of questions. just my 2 cents... Thanks, Eugene ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:56 ` Jonathan Nieder 2010-07-22 20:06 ` Avery Pennarun 2010-07-22 20:17 ` Ævar Arnfjörð Bjarmason @ 2010-07-22 20:43 ` Elijah Newren 2010-07-22 21:32 ` Avery Pennarun 2 siblings, 1 reply; 58+ messages in thread From: Elijah Newren @ 2010-07-22 20:43 UTC (permalink / raw) To: Jonathan Nieder Cc: Avery Pennarun, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Hi, On Thu, Jul 22, 2010 at 1:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Avery Pennarun wrote: > >> Unfortunately everything built *on top of* Linus's file format >> contribution has turned out to be a disaster. > > Aside: this kind of statement might make it unlikely for exactly > those who would benefit most from your opinions to read them. > > Well, that is my guess, anyway. I know that I have not found the time > to read your email (though I would like to) because I suspect based on > such sweeping statements that it would take a while to separate the > useful part from the rest. I'd usually agree with such a sentiment, but I don't think it's accurate in this case. Having read Avery's emails in this thread, I think he does a really good job explaining why submodules don't (and won't) work for a lot of people. I think he provided a better explanation than I could have for why I've never had much luck with submodules (and further convinced me that not only do they not work for me now, but they aren't ever going to fulfill the usecases I had). I can't really add much other than that we've been relatively happy with git-subtree and would like to see it or something like it merged. Our problems with it so far have turned out to be issues in other areas of git (e.g. the known issue about --prefix being ignored with the code being merged under a different directory due to rename-detection, and the bugs in merge-recursive's handling of D/F changes). Elijah ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 20:43 ` Elijah Newren @ 2010-07-22 21:32 ` Avery Pennarun 0 siblings, 0 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-22 21:32 UTC (permalink / raw) To: Elijah Newren Cc: Jonathan Nieder, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Thu, Jul 22, 2010 at 4:43 PM, Elijah Newren <newren@gmail.com> wrote: > (e.g. the known issue about --prefix being ignored with > the code being merged under a different directory due to > rename-detection, [...]) Aside: if this is the bug I think it is, then it's is fixed by the git merge -Xsubtree feature, which has since been merged into git. (I think Elijah knew that, I just wanted to make sure it's clear to anyone else reading.) Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:41 ` Avery Pennarun 2010-07-22 19:56 ` Jonathan Nieder @ 2010-07-23 8:31 ` Chris Webb 2010-07-23 8:40 ` Avery Pennarun 2010-07-23 15:10 ` Jens Lehmann 2010-07-23 15:19 ` Marc Branchaud 3 siblings, 1 reply; 58+ messages in thread From: Chris Webb @ 2010-07-23 8:31 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Jens Lehmann, ?var Arnfj?r? Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Avery Pennarun <apenwarr@gmail.com> writes: > I actually think Linus's contribution - the particular change to the > repo format to have trees link to commits - was exactly right. If we > want to talk about failings of git-subtree, they all precisely come > down to the fact that, because it has tree->tree links instead of > tree->commit links, it has to stash commitid information in the commit > message, which is gross and error prone. > > git-subtree would have benefitted from tree->commit links, but because > git's implementation of them is broken, that wasn't an option. I considered using submodules for one of my projects, and decided against for some of the usability reasons with multiple repositories which you highlight. (I didn't know about subtree.) You've surely considered this already, but reading your description in this thread, my first thought is that commits within trees could mean different things depending on whether they're at paths listed in .gitmodules or not. If the path is listed, the commit is in an external repository. If it isn't, it's a reference to a local commit, allowing submodules to live in the same repo as their parent and share some of the advantages you describe for sub-tree. Over time, git could then become smarter about recursing through commits in trees, although I can see a potential problem with needing to know about a .gitmodules blob in the top-level tree when we're examining a deeper level tree. Cheers, Chris. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 8:31 ` Chris Webb @ 2010-07-23 8:40 ` Avery Pennarun 2010-07-23 15:11 ` Jens Lehmann 2010-07-23 15:13 ` Jens Lehmann 0 siblings, 2 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-23 8:40 UTC (permalink / raw) To: Chris Webb Cc: Jakub Narebski, Jens Lehmann, ?var Arnfj?r? Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 4:31 AM, Chris Webb <chris@arachsys.com> wrote: > You've surely considered this already, but reading your description in this > thread, my first thought is that commits within trees could mean different > things depending on whether they're at paths listed in .gitmodules or not. > If the path is listed, the commit is in an external repository. If it isn't, > it's a reference to a local commit, allowing submodules to live in the same > repo as their parent and share some of the advantages you describe for > sub-tree. I think it would be better if we could abandon .gitmodules entirely; it's really only useful for listing repository URLs, and listing repository URLs is a major part of the problem. Something that would be neat, and at least vaguely backward-compatible would be to simply *try* fetching the linked commit objects from a remote repo, and checking them out from the local repo. If the objects exists, fetch/checkout of them will just work; if they don't, then it can (for backwards compatibility) revert to the current behaviour. Push would, if the objects exist, send them to the remote repo. Then there could be a .gitconfig option that flips this new behaviour on and off, ie. auto-checkouts subprojects that *can* be checked out without any extra knowledge, or not. If not, then you have to use the old-style git submodule stuff. (This proposal is not as easy as it sounds; to do it *right* would involve not having a separate .git repo for each subproject. That means changes to the index file format and a bunch of related stuff. Though I guess you could keep the sub-repo stuff and it would still be better than what we have now.) Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 8:40 ` Avery Pennarun @ 2010-07-23 15:11 ` Jens Lehmann 2010-07-23 22:33 ` Avery Pennarun 2010-07-23 15:13 ` Jens Lehmann 1 sibling, 1 reply; 58+ messages in thread From: Jens Lehmann @ 2010-07-23 15:11 UTC (permalink / raw) To: Avery Pennarun Cc: Chris Webb, Jakub Narebski, ?var Arnfj?r? Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Am 23.07.2010 10:40, schrieb Avery Pennarun: > I think it would be better if we could abandon .gitmodules entirely; > it's really only useful for listing repository URLs, and listing > repository URLs is a major part of the problem. Then where do you get the URL to clone the submodule from on "git clone --recursive"? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 15:11 ` Jens Lehmann @ 2010-07-23 22:33 ` Avery Pennarun 0 siblings, 0 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-23 22:33 UTC (permalink / raw) To: Jens Lehmann Cc: Chris Webb, Jakub Narebski, ?var Arnfj?r? Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 11:11 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote: > Am 23.07.2010 10:40, schrieb Avery Pennarun: >> I think it would be better if we could abandon .gitmodules entirely; >> it's really only useful for listing repository URLs, and listing >> repository URLs is a major part of the problem. > > Then where do you get the URL to clone the submodule from on "git > clone --recursive"? If you're asking that question, you're missing my point entirely. In my proposed model, the submodule objects are all in the same repo as the superproject, so there *is* no separate URL. And thus there is no more need for .gitmodules. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 8:40 ` Avery Pennarun 2010-07-23 15:11 ` Jens Lehmann @ 2010-07-23 15:13 ` Jens Lehmann 1 sibling, 0 replies; 58+ messages in thread From: Jens Lehmann @ 2010-07-23 15:13 UTC (permalink / raw) To: Avery Pennarun Cc: Chris Webb, Jakub Narebski, ?var Arnfj?r? Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Am 23.07.2010 10:40, schrieb Avery Pennarun: > I think it would be better if we could abandon .gitmodules entirely; > it's really only useful for listing repository URLs, and listing > repository URLs is a major part of the problem. Then where do you get the URL to clone the submodule from on "git clone --recursive"? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:41 ` Avery Pennarun 2010-07-22 19:56 ` Jonathan Nieder 2010-07-23 8:31 ` Chris Webb @ 2010-07-23 15:10 ` Jens Lehmann 2010-07-23 16:05 ` Bryan Larsen 2010-07-23 22:32 ` Avery Pennarun 2010-07-23 15:19 ` Marc Branchaud 3 siblings, 2 replies; 58+ messages in thread From: Jens Lehmann @ 2010-07-23 15:10 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds, Heiko Voigt Am 22.07.2010 21:41, schrieb Avery Pennarun: > I create an app named myapp on github: > > git://github.com/apenwarr/myapp > > It uses 17 different ruby gems, which I import as subprojects. I have > two choices: > > [1] .gitmodules can use absolute paths to the original gem locations: > > git://github.com/rubygems/gem[1..n] > > [2] Or else I can fork them all and use relative paths in .gitmodules: > > ../gem[1..n] > translates to --> git://github.com/apenwarr/gem[1..n] You forgot what we do as best practice at work: [3] Fork the gem repos on github (or another server reachable by your co-workers) and use those, so you don't have to change the URL later: git://github.com/apenwarrrubygems/gem[1..n] Your problems go away, setup has to be done only once on project start and not for every developer, you can use your own branchnames and you have a staging repo from where you can push patches upstream if necessary. > Surely including *repository URLs* inside the *repository content* is > at least as bad as including branch names. If we're going to do one, > we might as well do the other. But it won't help, because the stored > branch name will probably be 'master', and my personal hacked-up copy > of gem13 shouldn't be on a branch named master anyway. You sure are aware that having a branch name associated with a submodule checkout is a request repeatedly made? > The *real* need for submodules - the need you can't fix without > submodules - has nothing to do with these requirements. It's about > each submodule wanting to have its own lifecycle, owner, changelog, > and release process, and - perhaps this is actually the killer > requirement - each supermodule wanting to be able to cleanly rewind a > submodule if they don't like the new version. That is just one example. Another one is code shared between different repos (think: libraries) where you want to make sure that a bugfix in the library made in project A will make it to the shared code repo and thus doesn't have to be fixed again by projects B to X. This was one of the reasons we preferred submodules over subtrees in our evaluation, because there is no incentive to push fixes inside the subtree back to its own repo like there is when using submodules. >>> It's very clear that git-submodule's current behaviour totally >>> mismatches the entire git philosophy. That's why it's so impossible >>> to make the git-submodule command usable. >> >> That's very strong accusation. > > Agreed... but that doesn't make it wrong :) But calling a feature "impossible to make ... usable" is an interesting thing to say about a feature lots of people are using productively in their daily work, no? ;-) >> rebase and merge needs separate | rebase and merge works normally >> work in submodule currently | > > True. Nope, there is a patch in pu doing that when it is a simple fast forward and giving you advice when both sides are already merged inside the submodule (CCed Heiko, because he is the author of that feature) It is the /commits/ that have to be done twice, once in the submodule and then in the superproject. (But that is not necessarily bad, imagine having git gui as a submodule: you would be automagically reminded that stuff for git gui should be sent somewhere else than to Junio). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 15:10 ` Jens Lehmann @ 2010-07-23 16:05 ` Bryan Larsen 2010-07-23 17:11 ` Jens Lehmann 2010-07-23 22:32 ` Avery Pennarun 1 sibling, 1 reply; 58+ messages in thread From: Bryan Larsen @ 2010-07-23 16:05 UTC (permalink / raw) To: Jens Lehmann Cc: Avery Pennarun, Jakub Narebski, Ævar Arnfjörð Bjarmason, git, Junio C Hamano, Linus Torvalds, Heiko Voigt On 10-07-23 11:10 AM, Jens Lehmann wrote: > Am 22.07.2010 21:41, schrieb Avery Pennarun: >> I create an app named myapp on github: >> >> git://github.com/apenwarr/myapp >> >> It uses 17 different ruby gems, which I import as subprojects. I have >> two choices: >> >> [1] .gitmodules can use absolute paths to the original gem locations: >> >> git://github.com/rubygems/gem[1..n] >> >> [2] Or else I can fork them all and use relative paths in .gitmodules: >> >> ../gem[1..n] >> translates to --> git://github.com/apenwarr/gem[1..n] > > You forgot what we do as best practice at work: > > [3] Fork the gem repos on github (or another server reachable by your > co-workers) and use those, so you don't have to change the URL > later: > > git://github.com/apenwarrrubygems/gem[1..n] > > Your problems go away, setup has to be done only once on project > start and not for every developer, you can use your own branchnames > and you have a staging repo from where you can push patches upstream > if necessary. What's best practice for open source projects? I do this, but nobody except my coworkers can push to my forks, so it's a huge rigamarole just to get a fix into a submodule. > > That is just one example. Another one is code shared between > different repos (think: libraries) where you want to make sure that > a bugfix in the library made in project A will make it to the shared > code repo and thus doesn't have to be fixed again by projects B to X. > This was one of the reasons we preferred submodules over subtrees > in our evaluation, because there is no incentive to push fixes inside > the subtree back to its own repo like there is when using submodules. But you stated above that each project has its own fork of the library. So there's no special incentive to push changes from the fork back to its master repo. > > >>>> It's very clear that git-submodule's current behaviour totally >>>> mismatches the entire git philosophy. That's why it's so impossible >>>> to make the git-submodule command usable. >>> >>> That's very strong accusation. >> >> Agreed... but that doesn't make it wrong :) > > But calling a feature "impossible to make ... usable" is an > interesting thing to say about a feature lots of people are > using productively in their daily work, no? ;-) In my experience, it's possible to make it usable if and only if: 1. you have a small team 2. all of whom are very comfortable with git 3. changes inside submodules are either infrequent or only happen in a single direction 4. the project is not public/open source I think #4 is the killer reason why submodules don't work. It works fine if the submodule is fairly independent, but if you have a patch to the submodule that was created for and in the context of the superproject, things get really annoying really quickly. Bryan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 16:05 ` Bryan Larsen @ 2010-07-23 17:11 ` Jens Lehmann 2010-07-23 19:01 ` Bryan Larsen 0 siblings, 1 reply; 58+ messages in thread From: Jens Lehmann @ 2010-07-23 17:11 UTC (permalink / raw) To: Bryan Larsen Cc: Avery Pennarun, Jakub Narebski, Ævar Arnfjörð Bjarmason, git, Junio C Hamano, Linus Torvalds, Heiko Voigt Am 23.07.2010 18:05, schrieb Bryan Larsen: > On 10-07-23 11:10 AM, Jens Lehmann wrote: >> That is just one example. Another one is code shared between >> different repos (think: libraries) where you want to make sure that >> a bugfix in the library made in project A will make it to the shared >> code repo and thus doesn't have to be fixed again by projects B to X. >> This was one of the reasons we preferred submodules over subtrees >> in our evaluation, because there is no incentive to push fixes inside >> the subtree back to its own repo like there is when using submodules. > > But you stated above that each project has its own fork of the library. So there's no special incentive to push changes from the fork back to its master repo. When you are not working on your own, it is preferable to be able to get changes upstream into a submodules repo to share them. So if you can do that (either via push or patches sent by email or whatever), then use it's URL directly (and then you have the incentive that fixes get pushed, which is nice). Or you can't, then use a fork reachable by the people you work with (then you still can see all fixes made by your group in the forked repo and can decide to push them upstream). Then pushing fixes back to the original repo is a matter of courtesy, as it is with every other work flow I know. And I think that is just the same thing we all do with plain git repos when working with others: If you can push, you use it directly to clone from, if you can't, you fork it. > In my experience, it's possible to make it usable if and only if: > > 1. you have a small team > 2. all of whom are very comfortable with git > 3. changes inside submodules are either infrequent or only happen in a single direction > 4. the project is not public/open source > > I think #4 is the killer reason why submodules don't work. It works fine if the submodule is fairly independent, but if you have a patch to the submodule that was created for and in the context of the superproject, things get really annoying really quickly. What is the problem with the "forked repo" solution for #4? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 17:11 ` Jens Lehmann @ 2010-07-23 19:01 ` Bryan Larsen 0 siblings, 0 replies; 58+ messages in thread From: Bryan Larsen @ 2010-07-23 19:01 UTC (permalink / raw) To: Jens Lehmann Cc: Avery Pennarun, Jakub Narebski, Ævar Arnfjörð Bjarmason, git, Junio C Hamano, Linus Torvalds, Heiko Voigt On 10-07-23 01:11 PM, Jens Lehmann wrote: > Am 23.07.2010 18:05, schrieb Bryan Larsen: >> On 10-07-23 11:10 AM, Jens Lehmann wrote: >>> That is just one example. Another one is code shared between >>> different repos (think: libraries) where you want to make sure that >>> a bugfix in the library made in project A will make it to the shared >>> code repo and thus doesn't have to be fixed again by projects B to X. >>> This was one of the reasons we preferred submodules over subtrees >>> in our evaluation, because there is no incentive to push fixes inside >>> the subtree back to its own repo like there is when using submodules. >> >> But you stated above that each project has its own fork of the library. So there's no special incentive to push changes from the fork back to its master repo. > > When you are not working on your own, it is preferable to be able to > get changes upstream into a submodules repo to share them. > So if you can do that (either via push or patches sent by email or > whatever), then use it's URL directly (and then you have the incentive > that fixes get pushed, which is nice). > Or you can't, then use a fork reachable by the people you work with > (then you still can see all fixes made by your group in the forked > repo and can decide to push them upstream). Then pushing fixes back > to the original repo is a matter of courtesy, as it is with every > other work flow I know. > And I think that is just the same thing we all do with plain git > repos when working with others: If you can push, you use it directly > to clone from, if you can't, you fork it. So basically you're saying: sometimes you can use a non-forked repository, which has a whole bunch of disadvantages, but has the minor advantage that you're "forced" to push your changes upstream. Which I see as a disadvantage because that means you're pushing untested changes. Or else you use a forked repo, which is basically the same as using git-subtree, except for a lot of additional admin hassle. > > >> In my experience, it's possible to make it usable if and only if: >> >> 1. you have a small team >> 2. all of whom are very comfortable with git >> 3. changes inside submodules are either infrequent or only happen in a single direction >> 4. the project is not public/open source >> >> I think #4 is the killer reason why submodules don't work. It works fine if the submodule is fairly independent, but if you have a patch to the submodule that was created for and in the context of the superproject, things get really annoying really quickly. > > What is the problem with the "forked repo" solution for #4? > Please tell me how I can set up a public project on github where project A contains module X, so that Joe Average User can clone A, make a change in the module X and send a simple pull request to get that change into A. The change is one that's inappropriate to push upstream to X without additional work, but is appropriate for A at this point in time. Joe's a beginning git user. That's actually a simple use case compared to others I've run into. Bryan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 15:10 ` Jens Lehmann 2010-07-23 16:05 ` Bryan Larsen @ 2010-07-23 22:32 ` Avery Pennarun 2010-07-25 19:57 ` Jens Lehmann 1 sibling, 1 reply; 58+ messages in thread From: Avery Pennarun @ 2010-07-23 22:32 UTC (permalink / raw) To: Jens Lehmann Cc: Jakub Narebski, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds, Heiko Voigt On Fri, Jul 23, 2010 at 11:10 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote: > You forgot what we do as best practice at work: > > [3] Fork the gem repos on github (or another server reachable by your > co-workers) and use those, so you don't have to change the URL > later: > > git://github.com/apenwarrrubygems/gem[1..n] > > Your problems go away, setup has to be done only once on project > start and not for every developer, you can use your own branchnames > and you have a staging repo from where you can push patches upstream > if necessary. Now all your fellow developers have to push their submodule code to a single upstream repo? That's rather centralized and un-git-like. For the rest, Brian Larsen answered this one well, and I agree with him. >> Surely including *repository URLs* inside the *repository content* is >> at least as bad as including branch names. If we're going to do one, >> we might as well do the other. But it won't help, because the stored >> branch name will probably be 'master', and my personal hacked-up copy >> of gem13 shouldn't be on a branch named master anyway. > > You sure are aware that having a branch name associated with a > submodule checkout is a request repeatedly made? Of course it is; I requested it myself. Then, two years later after thinking about the problem a lot and writing git-subtree out of frustration, I realized that even if this feature existed, it wouldn't help at all. If you use git-submodule, you must push your submodule commits separately or the supermodule is broken for everybody but you. To push a submodule, you need a) an upstream to push to and b) a branch name. It's easy to forget to create a branch name, so of course people request that feature. However, the real problem is "you must push your submodule commits separately." Fix that, and I can guarantee that the request for submodule branch naming will disappear. > That is just one example. Another one is code shared between > different repos (think: libraries) where you want to make sure that > a bugfix in the library made in project A will make it to the shared > code repo and thus doesn't have to be fixed again by projects B to X. > This was one of the reasons we preferred submodules over subtrees > in our evaluation, because there is no incentive to push fixes inside > the subtree back to its own repo like there is when using submodules. I think you'd like svn; it's pretty cool. All changes made to a project need to get pushed to a central upstream repo so you never forget to share them. >>> rebase and merge needs separate | rebase and merge works normally >>> work in submodule currently | >> >> True. > > Nope, there is a patch in pu doing > that when it is a simple fast forward > and giving you advice when both sides > are already merged inside the submodule > (CCed Heiko, because he is the author > of that feature) Fast forwards are not merges, and pu is not now. > It is the /commits/ that have to be > done twice, once in the submodule and > then in the superproject. (But that is > not necessarily bad, imagine having git > gui as a submodule: you would be > automagically reminded that stuff for > git gui should be sent somewhere else > than to Junio). Yup, I agree that requiring a separate commit to the submodule repo is not a bad idea. I always do this anyway even when using git-subtree, because I'm thinking ahead to the day when I'll push my submodule changes upstream and I want my commit message to make sense. But that's because I think ahead like that. Having the tool force me to do it would be harmless and help people avoid mistakes. The syntax for it ought to be nice though. I should be able to do: git commit -- path/to/submodule And have it commit everything in the submodule tree as a new commit in the submodule. I don't want to have to think about cd'ing to path/to/submodule just so I can commit the files I changed in there. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 22:32 ` Avery Pennarun @ 2010-07-25 19:57 ` Jens Lehmann 2010-07-27 18:40 ` Avery Pennarun 0 siblings, 1 reply; 58+ messages in thread From: Jens Lehmann @ 2010-07-25 19:57 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds, Heiko Voigt Am 24.07.2010 00:32, schrieb Avery Pennarun: > On Fri, Jul 23, 2010 at 11:10 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote: >> You forgot what we do as best practice at work: >> >> [3] Fork the gem repos on github (or another server reachable by your >> co-workers) and use those, so you don't have to change the URL >> later: >> >> git://github.com/apenwarrrubygems/gem[1..n] >> >> Your problems go away, setup has to be done only once on project >> start and not for every developer, you can use your own branchnames >> and you have a staging repo from where you can push patches upstream >> if necessary. > > Now all your fellow developers have to push their submodule code to a > single upstream repo? That's rather centralized and un-git-like. But isn't that exactly the same thing you would have to do for your superproject too to be able to push your changes for your fellows? >> It is the /commits/ that have to be >> done twice, once in the submodule and >> then in the superproject. (But that is >> not necessarily bad, imagine having git >> gui as a submodule: you would be >> automagically reminded that stuff for >> git gui should be sent somewhere else >> than to Junio). > > Yup, I agree that requiring a separate commit to the submodule repo is > not a bad idea. I always do this anyway even when using git-subtree, > because I'm thinking ahead to the day when I'll push my submodule > changes upstream and I want my commit message to make sense. But > that's because I think ahead like that. Having the tool force me to > do it would be harmless and help people avoid mistakes. And submodules force you to do that. > The syntax for it ought to be nice though. I should be able to do: > > git commit -- path/to/submodule > > And have it commit everything in the submodule tree as a new commit in > the submodule. I don't want to have to think about cd'ing to > path/to/submodule just so I can commit the files I changed in there. Yes, that would be a nice feature (assuming you have a branch in the submodule to commit these changes to ;-). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-25 19:57 ` Jens Lehmann @ 2010-07-27 18:40 ` Avery Pennarun 2010-07-27 21:14 ` Jens Lehmann 0 siblings, 1 reply; 58+ messages in thread From: Avery Pennarun @ 2010-07-27 18:40 UTC (permalink / raw) To: Jens Lehmann Cc: Jakub Narebski, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds, Heiko Voigt On Sun, Jul 25, 2010 at 09:57:55PM +0200, Jens Lehmann wrote: > Am 24.07.2010 00:32, schrieb Avery Pennarun: > > On Fri, Jul 23, 2010 at 11:10 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote: > >> You forgot what we do as best practice at work: > >> > >> [3] Fork the gem repos on github (or another server reachable by your > >> co-workers) and use those, so you don't have to change the URL > >> later: > >> > >> git://github.com/apenwarrrubygems/gem[1..n] > >> > >> Your problems go away, setup has to be done only once on project > >> start and not for every developer, you can use your own branchnames > >> and you have a staging repo from where you can push patches upstream > >> if necessary. > > > > Now all your fellow developers have to push their submodule code to a > > single upstream repo? That's rather centralized and un-git-like. > > But isn't that exactly the same thing you would have to do for your > superproject too to be able to push your changes for your fellows? No. On github, only I can push to my superproject's history, and yet everyone can still pull from me. With what you're proposing, for all my submodules, we can't each have our own project; we all have to push to the shared one. (Just to be clear: I don't want to fork *every submodule by hand every time*. I just want *my* stuff to be in *my* repo. The easiest way to do this would be to have all my changes in a single repo, ie. my fork of the superproject.) > >> It is the /commits/ that have to be > >> done twice, once in the submodule and > >> then in the superproject. (But that is > >> not necessarily bad, imagine having git > >> gui as a submodule: you would be > >> automagically reminded that stuff for > >> git gui should be sent somewhere else > >> than to Junio). > > > > Yup, I agree that requiring a separate commit to the submodule repo is > > not a bad idea. I always do this anyway even when using git-subtree, > > because I'm thinking ahead to the day when I'll push my submodule > > changes upstream and I want my commit message to make sense. But > > that's because I think ahead like that. Having the tool force me to > > do it would be harmless and help people avoid mistakes. > > And submodules force you to do that. Yes. This is a limitation of submodules, but not one that bothers me. And it encourages good behaviour. > > The syntax for it ought to be nice though. I should be able to do: > > > > git commit -- path/to/submodule > > > > And have it commit everything in the submodule tree as a new commit in > > the submodule. I don't want to have to think about cd'ing to > > path/to/submodule just so I can commit the files I changed in there. > > Yes, that would be a nice feature (assuming you have a branch in the > submodule to commit these changes to ;-). No, I explicitly *don't* want to have to have a branch in the submodule; that's too much extra thinking at that stage. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 18:40 ` Avery Pennarun @ 2010-07-27 21:14 ` Jens Lehmann 0 siblings, 0 replies; 58+ messages in thread From: Jens Lehmann @ 2010-07-27 21:14 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds, Heiko Voigt Am 27.07.2010 20:40, schrieb Avery Pennarun: > With what you're proposing, for all my submodules, we can't each have our > own project; we all have to push to the shared one. > > (Just to be clear: I don't want to fork *every submodule by hand every > time*. I just want *my* stuff to be in *my* repo. The easiest way to do > this would be to have all my changes in a single repo, ie. my fork of the > superproject.) Fair enough, but that would not be the Right Thing for my use cases. (E.g. I am using submodules to have a single upstream repo for a library which I use in almost all my projects. And fixes to that library I do in one of these projects shall be fetchable in all other projects right after I pushed them to the submodules repo, without having to push them out of the superprojects repo into the shared one /again/. The situation at dayjob is the same and I assume a lot of people are using submodules this way). So I would vote for not breaking the *feature* submodules currently have: to use a different repo than that used for the superproject. Because that enables you to have shared content. I am not against having the /choice/ to have the submodules objects in the same repo as the superproject, but that should be an option and not mandatory. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-22 19:41 ` Avery Pennarun ` (2 preceding siblings ...) 2010-07-23 15:10 ` Jens Lehmann @ 2010-07-23 15:19 ` Marc Branchaud 2010-07-23 22:50 ` Avery Pennarun 3 siblings, 1 reply; 58+ messages in thread From: Marc Branchaud @ 2010-07-23 15:19 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On 10-07-22 03:41 PM, Avery Pennarun wrote: > > 1) Sometimes I want to clone only some subdirs of a project > 2) Sometimes I don't want the entire history because it's too big. > 3) Super huge git repositories start to degrade in performance. The reason we turned to submodules is precisely to deal with repository size. Our code base encompasses the entire FreeBSD tree plus different versions of the Linux kernel, along with various third-party libraries & apps. You don't need everything to build a given product (a FreeBSD product doesn't use any Linux kernels, for example) but because all the products share common code we need to be able to branch and tag the common code along with the uncommon code. So a straight "git clone" that would need to fetch all of FreeBSD plus 4 different Linux kernels and check all that out is a major problem, especially for our automated build system (which could definitely be implemented better, but still). In truth it's the checkout that takes the most time by far, though commands like git-status also take inconveniently long. We chose git-submodule over git-subtree mainly because git-submodule lets us selectively checkout different parts of our code. (AFAIK sparse checkouts aren't yet an option.) We didn't really consider git-subtree because it's not an official part of git, and we didn't want to have to teach (and nag) all our developers to install and maintain it in addition to keeping up with git itself. Besides, git-submodule's collection-of-independent-repos model works fairly well in our situation, though the implementation could definitely be improved (and Jens's list is a really good start). Neither submodule nor subtree really solves our situation, but right now git-submodule is the only thing "official" git offers to manage loosely-coupled code. It would be nice to see git-submodule added to the toolkit, but it would be even nicer if git had better ways to deal with "vast" repositories. Another tool folks should keep in mind in this discussion is 'repo' which Google built for the Android project. Android's code base is also too vast to work well in a single git repository, and I don't think subtrees or submodules would be a good match for them either. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 15:19 ` Marc Branchaud @ 2010-07-23 22:50 ` Avery Pennarun 2010-07-24 0:58 ` skillzero ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-23 22:50 UTC (permalink / raw) To: Marc Branchaud Cc: Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 11:19 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: > On 10-07-22 03:41 PM, Avery Pennarun wrote: >> 1) Sometimes I want to clone only some subdirs of a project >> 2) Sometimes I don't want the entire history because it's too big. >> 3) Super huge git repositories start to degrade in performance. > > The reason we turned to submodules is precisely to deal with repository size. I believe that's very common. However, I wonder whether that's actually a good reason for git to develop better submodules, or actually just a good reason for git to get better support for handling huge repositories. My bup project (http://github.com/apenwarr/bup) is all about huge repositories. It handles repositories with hundreds of gigabytes, and trees containing millions of files (entire filesystems), quite nicely. Of course, it's not a version control system, so it won't solve your problems. It's just evidence that large repositories are actually quite manageable without changing the fundamentals of git. > Our code base encompasses the entire FreeBSD tree plus different versions of > the Linux kernel, along with various third-party libraries & apps. You don't > need everything to build a given product (a FreeBSD product doesn't use any > Linux kernels, for example) but because all the products share common code we > need to be able to branch and tag the common code along with the uncommon code. Honest question: do you care about the wasted disk space and download time for these extra files? Or just the fact that git gets slow when you have them? How people answer that question very much affects the way git should be designed. > So a straight "git clone" that would need to fetch all of FreeBSD plus 4 > different Linux kernels and check all that out is a major problem, especially > for our automated build system (which could definitely be implemented better, > but still). To be absolutely pedantic, the four linux kernels likely share most of their objects and so you're only paying the cost (at least during fetch) of including it once :) (If you're actually using git-submodule and each copy of the kernel is its own module, then it might be cloning the kernel four times separately, in which case the objects *don't* get shared, so this ends up being much more expensive than it should be. That could be fixed by slightly improving git-submodule to share some objects rather than rearchitecting it though.) > In truth it's the checkout that takes the most time by far, > though commands like git-status also take inconveniently long. Yeah, git could stand to be optimized a bit here. And since Windows stats files about 10x slower than Linux, this problem occurs about 10x sooner on Windows, which makes using git on Windows (which sadly I have to do sometimes) extremely painful compared to Linux. IMHO, the correct answer here is to have an inotify-based daemon prod at the .git/index automatically when files get updated, so that git itself doesn't have to stat/readdir through the entire tree in order to do any of its operations. (Windows also has something like inotify that would work.) If you had this, then git status/diff/checkout/commit would be just as fast with zillions of files as with 10 files. Sooner or later, if nobody implements this, I promise I'll get around to it since inotify is actually easy to code for :) Also note that the only reason submodules are faster here is that they're ignoring possibly important changes. Notably, when you do 'git status' from the top level, it won't warn you if you have any not-yet-committed files in any of your submodules. Personally, I consider that to be really important information, but to obtain it would make 'git status' take just as long as without submodules, so you wouldn't get any benefit. (I think nowadays there's a way to get this recursive status information if you want it, but it'll be slow of course.) > We chose git-submodule over git-subtree mainly because git-submodule lets us > selectively checkout different parts of our code. (AFAIK sparse checkouts > aren't yet an option.) Fair enough. If you could confirm or deny my theory that this is *entirely* a performance related concern (as opposed to disk space / download time), that would be helpful. > We didn't really consider git-subtree because it's > not an official part of git, and we didn't want to have to teach (and nag) > all our developers to install and maintain it in addition to keeping up with > git itself. Arguably, this is a vote for including git-subtree into the core (which was Bryan's point when he started this thread); it obviously is being rejected sometimes by git users simply because it's not in the core, even though it could help them. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 22:50 ` Avery Pennarun @ 2010-07-24 0:58 ` skillzero 2010-07-24 1:20 ` Avery Pennarun 2010-07-26 8:56 ` Jakub Narebski 2010-07-24 20:07 ` Sverre Rabbelier ` (2 subsequent siblings) 3 siblings, 2 replies; 58+ messages in thread From: skillzero @ 2010-07-24 0:58 UTC (permalink / raw) To: Avery Pennarun Cc: Marc Branchaud, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > Honest question: do you care about the wasted disk space and download > time for these extra files? Or just the fact that git gets slow when > you have them? I have the similar situation to the original poster (huge trees) and for me it's all three: disk space, download time, and performance. My tree has a few relatively small (< 20 MB) shared directories of common code, a few large (2-6 GB) directories of code for OS's, and then several medium size (< 500 MB) directories for application code. The application developers only care about the app+shared directories (and are very annoyed by the massive space and performance impact of the OS directories). The firmware-only developers only care about OS+shared and are mildly annoyed by the medium space and performance impact of the app directories. I work on all of the pieces, but even I would prefer to have things separated so when I work on the apps, git status/etc doesn't take a big hit for close to a million files in the OS directories (particularly when doing git status on Windows). Even when using the -uno option to git status, it's still pretty slow (over a minute). git-submodule might be technically possible in this situation, but having to commit and push each submodule and then commit and push the super module makes it slightly worse than just dealing with the space/download/performance issues of one huge repository. git-subtree could also possibly help, but there's still extra work to split and merge each repository. And I'm not sure how it handles commit IDs across the repositories because I want to be able to say "I fixed that bug in shared/code.c in commit abc123" and have both the OS+shared and the apps+shared people be able git log abc123 and see the same change (and merge/cherry-pick/etc.). I think what I want is a way to do a sparse checkout where some sort of module is maintained in the git repository (probably just an INI-style file with paths) so I can clone directly from the server and it figures out the objects I need for the full history of only apps+shared (or firmware+shared, etc.) on the server side and only sends those objects. I still want to be able to branch, tag, and refer to commit IDs. So I only take the space/download/performance hit of directories included in the module, but I don't have to manually maintain that view of the repository (as I do with git-submodule and git-subtree). The closest thing to that so far for me has been the sparse checkout support added in git 1.7 combined with a convenience script I wrote. Everyone still has a huge download and .git directory, but at least the working copy is limited to the paths specified in the module so git status isn't super slow (although just having all those objects in the .git directory still slows it down quite a bit). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 0:58 ` skillzero @ 2010-07-24 1:20 ` Avery Pennarun 2010-07-24 19:40 ` skillzero 2010-07-26 16:37 ` Marc Branchaud 2010-07-26 8:56 ` Jakub Narebski 1 sibling, 2 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-24 1:20 UTC (permalink / raw) To: skillzero Cc: Marc Branchaud, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 8:58 PM, <skillzero@gmail.com> wrote: > On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@gmail.com> wrote: >> Honest question: do you care about the wasted disk space and download >> time for these extra files? Or just the fact that git gets slow when >> you have them? > > I have the similar situation to the original poster (huge trees) and > for me it's all three: disk space, download time, and performance. My > tree has a few relatively small (< 20 MB) shared directories of common > code, a few large (2-6 GB) directories of code for OS's, and then > several medium size (< 500 MB) directories for application code. The > application developers only care about the app+shared directories (and > are very annoyed by the massive space and performance impact of the OS > directories). Given how cheap disk space is nowadays, I'm curious about this. Are they really just annoyed by the performance problem, and they complain about the extra size because they blame the performance on the extra files? Or are they honestly short of disk space? Similarly, are all your developers located at the same office? If so, then bandwidth ought not be an issue. I'm pushing extra hard on this because I believe there are lots of opportunities to just improve git performance on huge repositories. And if the only *real* reason people need to split repositories is that performance goes down, then that's fixable, and you may need neither git-submodule nor git-subtree. > I work on all of the pieces, but even I would > prefer to have things separated so when I work on the apps, git > status/etc doesn't take a big hit for close to a million files in the > OS directories (particularly when doing git status on Windows). Even > when using the -uno option to git status, it's still pretty slow (over > a minute). This is indeed a problem with large repositories. Of course, splitting them with git-submodule is kind of cheating, because it just makes git-status *not look* to see if those files are dirty or not. If they are dirty and you forget to commit them, you'll never know until someone tells you later. It would be functionally equivalent to just have git-status not look inside certain subdirs of a single repository. In any case, this is a pretty clear optimization target (especially since Windows is so amazingly slow at statting files): just have a daemon running inotify (or the Windows equivalent) that tracks whether files are up-to-date or not. Then git would never need to recurse through the entire tree, and operations like status, diff, checkout, and commit could be fast even with a million-file repository. > git-subtree could also possibly help, but there's still extra work to > split and merge each repository. And I'm not sure how it handles > commit IDs across the repositories because I want to be able to say "I > fixed that bug in shared/code.c in commit abc123" and have both the > OS+shared and the apps+shared people be able git log abc123 and see > the same change (and merge/cherry-pick/etc.). git-subtree (if you don't use --squash) keeps all the commit IDs. It is extra work to split and merge between repositories, though. It doesn't solve your repository-is-too-large problem. > I think what I want is a way to do a sparse checkout where some sort > of module is maintained in the git repository (probably just an > INI-style file with paths) so I can clone directly from the server and > it figures out the objects I need for the full history of only > apps+shared (or firmware+shared, etc.) on the server side and only > sends those objects. I still want to be able to branch, tag, and refer > to commit IDs. So I only take the space/download/performance hit of > directories included in the module, but I don't have to manually > maintain that view of the repository (as I do with git-submodule and > git-subtree). Yes, better sparse checkout and sparse fetch would be very valuable here and would eliminate a lot of the reasons people have for misusing submodules. > (although just having all those objects in > the .git directory still slows it down quite a bit). You're the second person who has mentioned this today (the first one was to me in a private email). I'd like to understand this better. In my bup project (http://github.com/apenwarr/bup) we regularly create git repositories with hundreds of gigabytes of packs, comprising tens or hundreds of millions of objects, and the repository doesn't get slow. (Obviously this is a separate issue from having a huge work tree with a million files in it.) In repositories this thoroughly huge, we did find a way to improve memory usage versus git's pack .idx files (bup has '.midx' files that combine multiple indexes into one, thus reducing the binary search steps). But this only matters when you get well over 10 gigabytes of stuff and you're wading through it using crappy python code (as bup does) and frequently inserting a million objects at a time (as bup does). The git usage pattern is much simpler and therefore faster. How big is your .git directory and what performance problems do you see? I assume you've done 'git gc' to clean up all the loose objects, right? Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 1:20 ` Avery Pennarun @ 2010-07-24 19:40 ` skillzero 2010-07-25 1:47 ` Nguyen Thai Ngoc Duy 2010-07-26 13:13 ` Jakub Narebski 2010-07-26 16:37 ` Marc Branchaud 1 sibling, 2 replies; 58+ messages in thread From: skillzero @ 2010-07-24 19:40 UTC (permalink / raw) To: Avery Pennarun Cc: Marc Branchaud, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Fri, Jul 23, 2010 at 6:20 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > On Fri, Jul 23, 2010 at 8:58 PM, <skillzero@gmail.com> wrote: >> On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@gmail.com> wrote: >>> Honest question: do you care about the wasted disk space and download >>> time for these extra files? Or just the fact that git gets slow when >>> you have them? >> >> I have the similar situation to the original poster (huge trees) and >> for me it's all three: disk space, download time, and performance. My >> tree has a few relatively small (< 20 MB) shared directories of common >> code, a few large (2-6 GB) directories of code for OS's, and then >> several medium size (< 500 MB) directories for application code. The >> application developers only care about the app+shared directories (and >> are very annoyed by the massive space and performance impact of the OS >> directories). > > Given how cheap disk space is nowadays, I'm curious about this. Are > they really just annoyed by the performance problem, and they complain > about the extra size because they blame the performance on the extra > files? Or are they honestly short of disk space? I think it's both space and performance. When you're using SSD drives, storage still pretty expensive. A 128 GB or less SSD is pretty common in a laptop so you can run out pretty quick, especially when you're working concurrently on a few different branches at the same time. It's useful to keep multiple working copies (e.g. git-new-workdir) because rebuild time can be significant when switching branches. > Similarly, are all your developers located at the same office? If so, > then bandwidth ought not be an issue. Bandwidth isn't a big problem because you don't need to re-download the repo very often. However, people work at home a lot where bandwidth is more limited. The biggest complaint I hear about bandwidth is that people tend to re-download when something goes wrong (i.e. inexperience with git resulting in a repository they can't recover due to git resets, etc). > I'm pushing extra hard on this because I believe there are lots of > opportunities to just improve git performance on huge repositories. > And if the only *real* reason people need to split repositories is > that performance goes down, then that's fixable, and you may need > neither git-submodule nor git-subtree. Performance degradation is my biggest complaint with large repositories. Your inotify/FSEvents/etc daemon idea sounds interesting to deal with the stat issue. > This is indeed a problem with large repositories. Of course, > splitting them with git-submodule is kind of cheating, because it just > makes git-status *not look* to see if those files are dirty or not. > If they are dirty and you forget to commit them, you'll never know > until someone tells you later. It would be functionally equivalent to > just have git-status not look inside certain subdirs of a single > repository. I think it's only cheating if you're using all of the submodules. The main purpose of submodules for me (although I don't currently use submodules) would be so I don't need to keep modules on disk that I don't care about. If a developer is working on an app, they don't need the OS directories/modules so they get much faster git status/etc and there wouldn't be other directories to have dirty files in. That said, if I was using git submodule, I'd want git status to show me all the submodules that were checked out. >> (although just having all those objects in >> the .git directory still slows it down quite a bit). > > You're the second person who has mentioned this today (the first one > was to me in a private email). I'd like to understand this better. What I'm basing this on is that even when I'm using a sparse checkout such that I have only a small subset of the files in my working directory, git status seems singifncantly slower for me than an equivalent git repository that only has that subset of files. That's not very scientific, but that's what made me think just having a large .git directory with lots of objects/history slows down git status even if the working copy doesn't have a lot of files. I will try to experiment and see if I can narrow it down with some real numbers. BTW...what's the policy on CC'ing people on git mailing list replies? Should it be trimmed or not? I've received complaints in the past, but I was never really clear what the recommended policy is. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 19:40 ` skillzero @ 2010-07-25 1:47 ` Nguyen Thai Ngoc Duy 2010-07-28 22:27 ` Jakub Narebski 2010-07-26 13:13 ` Jakub Narebski 1 sibling, 1 reply; 58+ messages in thread From: Nguyen Thai Ngoc Duy @ 2010-07-25 1:47 UTC (permalink / raw) To: skillzero Cc: Avery Pennarun, Marc Branchaud, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Sun, Jul 25, 2010 at 5:40 AM, <skillzero@gmail.com> wrote: >>> (although just having all those objects in >>> the .git directory still slows it down quite a bit). >> >> You're the second person who has mentioned this today (the first one >> was to me in a private email). I'd like to understand this better. > > What I'm basing this on is that even when I'm using a sparse checkout > such that I have only a small subset of the files in my working > directory, git status seems singifncantly slower for me than an > equivalent git repository that only has that subset of files. That's > not very scientific, but that's what made me think just having a large > .git directory with lots of objects/history slows down git status even > if the working copy doesn't have a lot of files. Hmm... I recall I experienced some slower operations on webkit with sparse checkout too. > > I will try to experiment and see if I can narrow it down with some real numbers. Yes, I'd appreciate that. By the way, how hard is it to use git-replace to implement narrow clone? -- Duy ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-25 1:47 ` Nguyen Thai Ngoc Duy @ 2010-07-28 22:27 ` Jakub Narebski 0 siblings, 0 replies; 58+ messages in thread From: Jakub Narebski @ 2010-07-28 22:27 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy Cc: skillzero, Avery Pennarun, Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Dnia niedziela 25. lipca 2010 03:47, Nguyen Thai Ngoc Duy napisał: > > By the way, how hard is it to use git-replace to implement narrow clone? I don't think that git-replace should be used to implement narrow clone, although it could probable be abused to do so. The refs/replaces mechanism is about static replacements... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 19:40 ` skillzero 2010-07-25 1:47 ` Nguyen Thai Ngoc Duy @ 2010-07-26 13:13 ` Jakub Narebski 1 sibling, 0 replies; 58+ messages in thread From: Jakub Narebski @ 2010-07-26 13:13 UTC (permalink / raw) To: skillzero Cc: Avery Pennarun, Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Sat, Jul 24, 2010, skillzero@gmail.com wrote: > On Fri, Jul 23, 2010 at 6:20 PM, Avery Pennarun <apenwarr@gmail.com> wrote: >> On Fri, Jul 23, 2010 at 8:58 PM, <skillzero@gmail.com> wrote: >>> On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@gmail.com> wrote: >> This is indeed a problem with large repositories. Of course, >> splitting them with git-submodule is kind of cheating, because it just >> makes git-status *not look* to see if those files are dirty or not. >> If they are dirty and you forget to commit them, you'll never know >> until someone tells you later. It would be functionally equivalent to >> just have git-status not look inside certain subdirs of a single >> repository. > > I think it's only cheating if you're using all of the submodules. The > main purpose of submodules for me (although I don't currently use > submodules) would be so I don't need to keep modules on disk that I > don't care about. If a developer is working on an app, they don't need > the OS directories/modules so they get much faster git status/etc and > there wouldn't be other directories to have dirty files in. [...] There are two issues that make submodules or git-subtree a better solution. If you work with subprojects via upstream subproject repository, and you don't always need / want all subprojects, git-submodule is better. If you always have checked out all subprojects, and you edit them in superproject, git-subtree is better. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 1:20 ` Avery Pennarun 2010-07-24 19:40 ` skillzero @ 2010-07-26 16:37 ` Marc Branchaud 2010-07-26 16:41 ` Linus Torvalds 1 sibling, 1 reply; 58+ messages in thread From: Marc Branchaud @ 2010-07-26 16:37 UTC (permalink / raw) To: Avery Pennarun Cc: skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On 10-07-23 09:20 PM, Avery Pennarun wrote: > > I'm pushing extra hard on this because I believe there are lots of > opportunities to just improve git performance on huge repositories. > And if the only *real* reason people need to split repositories is > that performance goes down, then that's fixable, and you may need > neither git-submodule nor git-subtree. I think I should mention one aspect of what we're doing, which is that a lot of our submodules are based on external code, and that we occasionally need to modify or customize some of that code. So it's quite nice for us to maintain private git mirrors of the external repos, with our own private branches that contain our modifications. Although we want to get much of our changes incorporated into the upstream code bases, upstream release cycles are rarely in sync with ours. So it's very convenient for use to have our external-code modifications contained in private branches in our private mirrors, and to rebase those branches to keep up with upstream releases. We also often use these private branches to maintain the code that integrates the external code bases into our overall build system. I mention this purely because this pattern is so convenient that I don't want to see it get lost in whatever may arise from this discussion. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 16:37 ` Marc Branchaud @ 2010-07-26 16:41 ` Linus Torvalds 2010-07-26 17:36 ` Bryan Larsen 2010-07-27 18:28 ` Avery Pennarun 0 siblings, 2 replies; 58+ messages in thread From: Linus Torvalds @ 2010-07-26 16:41 UTC (permalink / raw) To: Marc Branchaud Cc: Avery Pennarun, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano On Mon, Jul 26, 2010 at 9:37 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: > > I think I should mention one aspect of what we're doing, which is that a lot > of our submodules are based on external code, and that we occasionally need > to modify or customize some of that code. So it's quite nice for us to > maintain private git mirrors of the external repos, with our own private > branches that contain our modifications. Although we want to get much of our > changes incorporated into the upstream code bases, upstream release cycles > are rarely in sync with ours. THIS. This is why I always thought that submodules absolutely have to be commits, not trees. It's why the git submodule data structures are done the way they are. Anything that makes the submodule just a tree is fundamentally broken, I think. That said, I'm not competent to comment on the actual user interface issues. I can well believe that git-subtree has a nicer interface. Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 16:41 ` Linus Torvalds @ 2010-07-26 17:36 ` Bryan Larsen 2010-07-26 17:48 ` Linus Torvalds 2010-07-27 18:28 ` Avery Pennarun 1 sibling, 1 reply; 58+ messages in thread From: Bryan Larsen @ 2010-07-26 17:36 UTC (permalink / raw) To: Linus Torvalds Cc: Marc Branchaud, Avery Pennarun, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, git, Junio C Hamano On 10-07-26 12:41 PM, Linus Torvalds wrote: > On Mon, Jul 26, 2010 at 9:37 AM, Marc Branchaud<marcnarc@xiplink.com> wrote: >> >> I think I should mention one aspect of what we're doing, which is that a lot >> of our submodules are based on external code, and that we occasionally need >> to modify or customize some of that code. So it's quite nice for us to >> maintain private git mirrors of the external repos, with our own private >> branches that contain our modifications. Although we want to get much of our >> changes incorporated into the upstream code bases, upstream release cycles >> are rarely in sync with ours. > > THIS. > > This is why I always thought that submodules absolutely have to be > commits, not trees. It's why the git submodule data structures are > done the way they are. Anything that makes the submodule just a tree > is fundamentally broken, I think. > > That said, I'm not competent to comment on the actual user interface > issues. I can well believe that git-subtree has a nicer interface. > > Linus > To me, that's what git-subtree is: an internal private mirror of an external repo. Using git submodule moves that into a separately managed repo, which is just unnecessary hassle. Why maintain repo called "clone of library X for project A" when you can just stick it inside of project A without any downsides? For us, changes are made in the superproject and tested in the superproject. Once they're tested, a git subtree push or a git subtree split pushes the patches to the subproject. Once the subproject has accepted the patches, a git subtree pull merges them. Same workflow as the "private git mirror of external repo" listed above, just without the hassle of having another repo to manage. Bryan ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 17:36 ` Bryan Larsen @ 2010-07-26 17:48 ` Linus Torvalds 0 siblings, 0 replies; 58+ messages in thread From: Linus Torvalds @ 2010-07-26 17:48 UTC (permalink / raw) To: Bryan Larsen Cc: Marc Branchaud, Avery Pennarun, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, git, Junio C Hamano On Mon, Jul 26, 2010 at 10:36 AM, Bryan Larsen <bryan.larsen@gmail.com> wrote: > > To me, that's what git-subtree is: an internal private mirror of an external > repo. Using git submodule moves that into a separately managed repo, which > is just unnecessary hassle. Why maintain repo called "clone of library X > for project A" when you can just stick it inside of project A without any > downsides? Without any downsides? What about merging? What about complex history? IOW, what about _anything_ but a few extra one-liner patches? Background: the only time I ever used CVS modules, we had submodules for things like gcc, binutils, etc. And maintained them separately from upstream for _years_. Not with some simple one-liner fixes, but with big fundamental changes that couldn't be sent upstream (and wouldn't have been accepted anyway) etc. THAT is the problem space. Not "just a mirror of another project". Linus ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 16:41 ` Linus Torvalds 2010-07-26 17:36 ` Bryan Larsen @ 2010-07-27 18:28 ` Avery Pennarun 2010-07-27 20:25 ` Junio C Hamano 1 sibling, 1 reply; 58+ messages in thread From: Avery Pennarun @ 2010-07-27 18:28 UTC (permalink / raw) To: Linus Torvalds Cc: Marc Branchaud, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano On Mon, Jul 26, 2010 at 09:41:42AM -0700, Linus Torvalds wrote: > On Mon, Jul 26, 2010 at 9:37 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: > > > > I think I should mention one aspect of what we're doing, which is that a lot > > of our submodules are based on external code, and that we occasionally need > > to modify or customize some of that code. So it's quite nice for us to > > maintain private git mirrors of the external repos, with our own private > > branches that contain our modifications. Although we want to get much of our > > changes incorporated into the upstream code bases, upstream release cycles > > are rarely in sync with ours. > > THIS. > > This is why I always thought that submodules absolutely have to be > commits, not trees. It's why the git submodule data structures are > done the way they are. Anything that makes the submodule just a tree > is fundamentally broken, I think. I agree completely. The major failing of git-subtree is that it uses tree->tree links instead of tree->commit links. This was necessary only because git fundamentally *mistreats* tree->commit links: it refuses to push or fetch through them automatically. That is, when I fetch a superproject that has a tree->commit link in it, git won't fetch the subproject's history starting at the targeted commit, even if the remote repo *has* that history. And if I make a patch to the subproject, pushing the superproject won't push that patch. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 18:28 ` Avery Pennarun @ 2010-07-27 20:25 ` Junio C Hamano 2010-07-27 20:57 ` Avery Pennarun 0 siblings, 1 reply; 58+ messages in thread From: Junio C Hamano @ 2010-07-27 20:25 UTC (permalink / raw) To: Avery Pennarun Cc: Linus Torvalds, Marc Branchaud, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git Avery Pennarun <apenwarr@gmail.com> writes: > On Mon, Jul 26, 2010 at 09:41:42AM -0700, Linus Torvalds wrote: > >> On Mon, Jul 26, 2010 at 9:37 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: >> > >> > I think I should mention one aspect of what we're doing, which is that a lot >> > of our submodules are based on external code, and that we occasionally need >> > to modify or customize some of that code. So it's quite nice for us to >> > maintain private git mirrors of the external repos, with our own private >> > branches that contain our modifications. Although we want to get much of our >> > changes incorporated into the upstream code bases, upstream release cycles >> > are rarely in sync with ours. >> >> THIS. >> >> This is why I always thought that submodules absolutely have to be >> commits, not trees. It's why the git submodule data structures are >> done the way they are. Anything that makes the submodule just a tree >> is fundamentally broken, I think. > > I agree completely. The major failing of git-subtree is that it uses > tree->tree links instead of tree->commit links. > > This was necessary only because git fundamentally *mistreats* tree->commit > links: it refuses to push or fetch through them automatically. I do not think that is so "fundamental" as you seem to think. Isn't it just the matter of how the default UI of object transfer commands (like push and fetch) are set up? Admittedly, the way the default UI is set up is to strongly favor the early design decision we made back when Linus did his initial "gitlink" implementation, which is "separate project lives in a separate repository, and not having to check out any subproject should be the norm for using a superproject". Some "recursive" operations have been added to commands for which it makes sense (e.g. "clone --recursive") by people who cared enough. Even though there are a few other commands that shouldn't ever learn the recursive mode (e.g. "commit --recursive -m $msg" would not make sense), there still are some commands where a similar "--recursive" option would make sense but haven't learned it (e.g. "push --recursive"). I also consider it merely a lack of UI enhancement that you have to clone the submodule again (or cannot switch to a clean slate very easily) when switching between revisions of superproject before and after you add a submodule, and nothing fundamental. When switching back in history to lose a recent submodule, the user experience should be like switching to a revision that didn't have a directory. You shouldn't be able to lose your change in that directory, but if the directory is clean, you should be able to lose it. And when you switch to a more recent revision that has the submodule, you should be able to get it back (again, if you have a precious file there, the checkout should barf). We have added support for having "gitdir: $dir" in a regular file .git exactly because we wanted to be able to stash away the submodule's .git directory somewhere inside .git (e.g. .git/modules/<submodulename>) in the superproject when we do that kind of branch switching, so that we can get it back when switching back to a revision with the submodule without having to re-clone (also this presumably would help when you move the submodule in the superproject tree), but there haven't been further work to make use of this in "git submodule update" (it probably needs to start by teaching "git clone" how to make use of "gitdir: $dir", if anybody is interested). By the way, I also do not think it is such a bad thing that git-subtree does not bind commit into its superproject tree while it is working "natively" (in a "git-subtree" workflow), but allows users to easily split the history into an exportable shape to upstreams of its submodules when such an operqation is needed. If you rarely push back to upstreams but constantly consume their changes, that sounds like a reasonable way to go. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 20:25 ` Junio C Hamano @ 2010-07-27 20:57 ` Avery Pennarun 2010-07-27 21:14 ` Junio C Hamano 2010-07-27 21:32 ` Jens Lehmann 0 siblings, 2 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-27 20:57 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Marc Branchaud, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð, Bryan Larsen, git On Tue, Jul 27, 2010 at 4:25 PM, Junio C Hamano <gitster@pobox.com> wrote: > Avery Pennarun <apenwarr@gmail.com> writes: >> I agree completely. The major failing of git-subtree is that it uses >> tree->tree links instead of tree->commit links. >> >> This was necessary only because git fundamentally *mistreats* tree->commit >> links: it refuses to push or fetch through them automatically. > > I do not think that is so "fundamental" as you seem to think. > > Isn't it just the matter of how the default UI of object transfer commands > (like push and fetch) are set up? Well, I call it fundamental because there's currently no way to get the git UI to do otherwise. It's not really just a "default." To depend on this changing would have prevented me from writing git-subtree, which is why I didn't depend on it. However, I agree that it's fixable. Note that the way git treats a checked-out submodule (as you describe below) is also very fundamental to how this works. git-subtree wouldn't have the usability that it does if 'git checkout branchname' didn't work perfectly will all the subtrees, which it currently does, but which it wouldn't if I had relied on tree->commit links. > Some "recursive" operations have been added to commands for which it makes > sense (e.g. "clone --recursive") by people who cared enough. Even though > there are a few other commands that shouldn't ever learn the recursive > mode (e.g. "commit --recursive -m $msg" would not make sense), there still > are some commands where a similar "--recursive" option would make sense > but haven't learned it (e.g. "push --recursive"). One problem with this line of reasoning is that "--recursive" is always an option. But if submodules are ever to be easy to use, I think it should be the default (or settable as a default using git config). This would take us a *long* way towards usability (of course, in addition to adding the missing features, as you mention). Also, I haven't tried it, but I think 'git gc' will prune away objects if the only reference to them is a 'commit' link from a tree. This would be undesirable too. > I also consider it merely a lack of UI enhancement that you have to clone > the submodule again (or cannot switch to a clean slate very easily) when > switching between revisions of superproject before and after you add a > submodule, and nothing fundamental. I mostly agree with this. There is one problem I don't know how to solve with this idea, though: what happens when commit A adds a submodule in modules/mod1, commit B removes it, and then commit C re-adds the same submodules in modules/mod1-again? Will it reuse the same submodule .git directory or a new one? Share objects or not? Share branch names or not? Share .git/config or not? Unless you have some kind of "unique id" scheme for submodules, this gets impossible to handle correctly. And the git objects themselves (trees that link to commits) have nowhere to put such things. By comparison, simply putting all the stuff related to all the submodules into the supermodule's repo creates none of these confusing problems. You could even still choose not to checkout individual submodules' trees if you wanted. > When switching back in history to lose a recent submodule, the user > experience should be like switching to a revision that didn't have a > directory. You shouldn't be able to lose your change in that directory, > but if the directory is clean, you should be able to lose it. And when > you switch to a more recent revision that has the submodule, you should be > able to get it back (again, if you have a precious file there, the > checkout should barf). It sounds like you're proposing that we delete the entire submodule's directory hierarchy when the submodule commit link goes away. Note that this isn't what happens in the non-submodule case: all the *.o files, for example, in a deleted subdirectory are not automatically deleted by git. And I think this is the behaviour we should expect. With that in mind, the situations where checkout barfs because of a "precious" file should be the same as they are in normal git: it should only be a problem if the files in question differ between the originally-checked-out tree and the newly-checked-out tree. Apologies if that's what you meant in the first place. > We have added support for having "gitdir: $dir" in a regular file .git > exactly because we wanted to be able to stash away the submodule's .git > directory somewhere inside .git (e.g. .git/modules/<submodulename>) in the > superproject when we do that kind of branch switching, so that we can get > it back when switching back to a revision with the submodule without > having to re-clone (also this presumably would help when you move the > submodule in the superproject tree), but there haven't been further work > to make use of this in "git submodule update" (it probably needs to start > by teaching "git clone" how to make use of "gitdir: $dir", if anybody is > interested). I guess the real question is: just how much of a "real" repository do we want a submodule to act like? Thoughts: - object store: I think this should just always be shared with the superproject. There's no reason to separate them that I can see. - branches: should be a way to simply not worry about branches and just use what's in the superproject. Other people seem to want to be able to have a set of branches/tags for their submodule. - .git/config: entirely shared? entirely separate? - remotes: I would want my submodules to never do their own pushing/pulling, and leave that to the supermodule; other people seem to disagree. For the particular model I'm proposing, I'm just not sure that *any* of the features of a separate repo are warranted... and having them adds a lot of complication. (In the most basic level, you suddenly need to track .git directories as submodules are added/deleted/moved around when you checkout different revisions of the superproject, and there seems to be no way to do that elegantly.) Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 20:57 ` Avery Pennarun @ 2010-07-27 21:14 ` Junio C Hamano 2010-07-27 21:32 ` Jens Lehmann 1 sibling, 0 replies; 58+ messages in thread From: Junio C Hamano @ 2010-07-27 21:14 UTC (permalink / raw) To: Avery Pennarun Cc: Junio C Hamano, Linus Torvalds, Marc Branchaud, skillzero, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð, Bryan Larsen, git Avery Pennarun <apenwarr@gmail.com> writes: > ... There is one problem I don't know how to > solve with this idea, though: what happens when commit A adds a > submodule in modules/mod1, commit B removes it, and then commit C > re-adds the same submodules in modules/mod1-again? Will it reuse the > same submodule .git directory or a new one? Share objects or not? > Share branch names or not? Share .git/config or not? > > Unless you have some kind of "unique id" scheme for submodules, this > gets impossible to handle correctly. And the git objects themselves > (trees that link to commits) have nowhere to put such things. I vaguely recall that we already had discussed and more or less resolved it at the design level at some point. Looking for "three-level thing" in the gmane archive might be beneficial, although all I recall these three words as search keywords and do not have a detailed recollection of actual discussion ;-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 20:57 ` Avery Pennarun 2010-07-27 21:14 ` Junio C Hamano @ 2010-07-27 21:32 ` Jens Lehmann 1 sibling, 0 replies; 58+ messages in thread From: Jens Lehmann @ 2010-07-27 21:32 UTC (permalink / raw) To: Avery Pennarun Cc: Junio C Hamano, Linus Torvalds, Marc Branchaud, skillzero, Jakub Narebski, Ævar Arnfjörð, Bryan Larsen, git Am 27.07.2010 22:57, schrieb Avery Pennarun: > One problem with this line of reasoning is that "--recursive" is > always an option. But if submodules are ever to be easy to use, I > think it should be the default (or settable as a default using git > config). This would take us a *long* way towards usability (of > course, in addition to adding the missing features, as you mention). And that is exactly what I am currently doing: - I already teached diff and status to always recurse (and just sent a patch to add a config option for that behavior, as some users either can't pay the performance costs or don't want to see submodules show up as modified just because they contain untracked files). - I posted a WIP patch doing recursive checkouts (that is basically working but I still have to put in the safety checks so that no modifications to submodules are accidentally discarded unless -f is used). - I am working on a recursive fetch too. And then there is other stuff on my list to be tackled; I try to fix these issues so that the most annoying problems get solved first. Unfortunately that does not proceed as fast as i wished, but hopefully I can show some progress in the near future. Of course any help would greatly be appreciated ;-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-24 0:58 ` skillzero 2010-07-24 1:20 ` Avery Pennarun @ 2010-07-26 8:56 ` Jakub Narebski 2010-07-27 18:36 ` Avery Pennarun 1 sibling, 1 reply; 58+ messages in thread From: Jakub Narebski @ 2010-07-26 8:56 UTC (permalink / raw) To: skillzero Cc: Avery Pennarun, Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Sat, Jul 24, 2010, skillzero@gmail.com napisał: > On Fri, Jul 23, 2010 at 3:50 PM, Avery Pennarun <apenwarr@gmail.com> wrote: > > > Honest question: do you care about the wasted disk space and download > > time for these extra files? Or just the fact that git gets slow when > > you have them? > > I have the similar situation to the original poster (huge trees) and > for me it's all three: disk space, download time, and performance. My > tree has a few relatively small (< 20 MB) shared directories of common > code, a few large (2-6 GB) directories of code for OS's, and then > several medium size (< 500 MB) directories for application code. The > application developers only care about the app+shared directories (and > are very annoyed by the massive space and performance impact of the OS > directories). The firmware-only developers only care about OS+shared > and are mildly annoyed by the medium space and performance impact of > the app directories. I work on all of the pieces, but even I would > prefer to have things separated so when I work on the apps, git > status/etc doesn't take a big hit for close to a million files in the > OS directories (particularly when doing git status on Windows). Even > when using the -uno option to git status, it's still pretty slow (over > a minute). > > git-submodule might be technically possible in this situation, but > having to commit and push each submodule and then commit and push the > super module makes it slightly worse than just dealing with the > space/download/performance issues of one huge repository. But this is just a matter for improving UI for dealing with submodules, isn't it. For example having "git commit --recursive" would help with 'having to commit each submodule', though how you would write commit messages then: perhaps supermodule commit message could be by default composed out of submodules commits (if any). "git push --recursive" (or some support for push in "git remote") would help with 'having to push each submodule'. Isn't it? -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 8:56 ` Jakub Narebski @ 2010-07-27 18:36 ` Avery Pennarun 2010-07-28 13:36 ` Marc Branchaud 2010-07-28 18:32 ` Jakub Narebski 0 siblings, 2 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-27 18:36 UTC (permalink / raw) To: Jakub Narebski Cc: skillzero, Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Mon, Jul 26, 2010 at 10:56:58AM +0200, Jakub Narebski wrote: > On Sat, Jul 24, 2010, skillzero@gmail.com napisał: > > git-submodule might be technically possible in this situation, but > > having to commit and push each submodule and then commit and push the > > super module makes it slightly worse than just dealing with the > > space/download/performance issues of one huge repository. > > But this is just a matter for improving UI for dealing with submodules, > isn't it. For example having "git commit --recursive" would help > with 'having to commit each submodule', though how you would write commit > messages then: perhaps supermodule commit message could be by default > composed out of submodules commits (if any). "git push --recursive" > (or some support for push in "git remote") would help with 'having to > push each submodule'. For "recursive" commit, for my own workflow, I would rather have it work like this: from the toplevel, I can 'git commit' any set of files, as long as they all fall inside a particular submodule. That is, if I do git commit mod1/*.c mod2/*.c it should reject it (with a helpful message), because the commit would cross submodule boundaries. But if I do git commit mod1/*.c I think it should create a new commit in mod1, leave my superproject pointing at that new commit, and stop (ie. without the superproject having committed the new commit pointer). Why? Because my normal workflow is: - make a bunch of superproject/submodule changes until they work. - commit the submodule changes with a submodule-relevant message - commit the superproject change with a supermodule-relevant message I wouldn't want to share commit messages between the two, so actually having a single commit process be "recursive" would not do me any good. However, pushing is a separate issue entirely. Having push be recursive would be easy, but it doesn't solve the *real* problem with pushing: git doesn't know what branch to push to in the submodule, and the submodule most likely isn't pointing at a pushable repo at all, even if the supermodule is. This is why I keep coming back to the idea that I really want to push all the submodule objects into the superproject's repo. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 18:36 ` Avery Pennarun @ 2010-07-28 13:36 ` Marc Branchaud 2010-07-28 18:32 ` Jakub Narebski 1 sibling, 0 replies; 58+ messages in thread From: Marc Branchaud @ 2010-07-28 13:36 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, skillzero, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On 10-07-27 02:36 PM, Avery Pennarun wrote: > > For "recursive" commit, for my own workflow, I would rather have it work > like this: from the toplevel, I can 'git commit' any set of files, as long > as they all fall inside a particular submodule. That is, if I do > > git commit mod1/*.c mod2/*.c > > it should reject it (with a helpful message), because the commit would cross > submodule boundaries. But if I do > > git commit mod1/*.c > > I think it should create a new commit in mod1, leave my superproject > pointing at that new commit, and stop (ie. without the superproject having > committed the new commit pointer). I think that makes perfect sense. I'd also want the updated pointer to be unstaged. > Why? Because my normal workflow is: > > - make a bunch of superproject/submodule changes until they work. > - commit the submodule changes with a submodule-relevant message > - commit the superproject change with a supermodule-relevant message > > I wouldn't want to share commit messages between the two, so actually having > a single commit process be "recursive" would not do me any good. That's the workflow I'd like to follow as well. In terms of achieving this workflow with submodules and branching, what's required is that branching in the superproject takes the submodules off of the detached HEAD and onto something that won't get automatically garbage-collected in a few weeks. That could be done simply by applying the superproject's branch to all the submodules. A command like superproject/$ git branch foo origin/master would create the submodule branches on the commits identified for the submodules in the superproject's origin/master commit. To make that work smoothly I think requires all the submodules' .git directories, so the branch name can be recorded in all of them. And so I think that either "git fetch" has to recursively obtain (and update) all submodule repos, or there needs to be some kind of on-demand retrieval mechanism. Other ideas for grand-unified object stores (which I haven't been following too closely) could work as well. So with unified branching and available .git directories, I think a recursive checkout is doable and makes sense. I'd still like to control which submodules a checkout might recurse through, but I think the sparse-checkout system is the way to handle that. I also suspect that non-fast-forward submodule merges could be workable, where regular merges are performed individually in the submodules before merging in the superproject. One final, somewhat orthogonal thought: I think that "git commit submodule-dir" should require -f if the remote associated with the submodule doesn't have the commit ID you're trying to commit. > However, pushing is a separate issue entirely. Having push be recursive > would be easy, but it doesn't solve the *real* problem with pushing: git > doesn't know what branch to push to in the submodule, and the submodule most > likely isn't pointing at a pushable repo at all, even if the supermodule is. > This is why I keep coming back to the idea that I really want to push all > the submodule objects into the superproject's repo. I agree that recursive pushing doesn't make much sense, so there's no need to try to implement it. I think having "git commit" reject unpushed submodule updates in the superproject goes a long way to alleviating misordered pushing. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-27 18:36 ` Avery Pennarun 2010-07-28 13:36 ` Marc Branchaud @ 2010-07-28 18:32 ` Jakub Narebski 1 sibling, 0 replies; 58+ messages in thread From: Jakub Narebski @ 2010-07-28 18:32 UTC (permalink / raw) To: Avery Pennarun Cc: skillzero, Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Tue, Jul 27, 2010, Avery Pennarun wrote: > On Mon, Jul 26, 2010 at 10:56:58AM +0200, Jakub Narebski wrote: > > On Sat, Jul 24, 2010, skillzero@gmail.com napisał: > > > > > > git-submodule might be technically possible in this situation, but > > > having to commit and push each submodule and then commit and push the > > > super module makes it slightly worse than just dealing with the > > > space/download/performance issues of one huge repository. > > > > But this is just a matter for improving UI for dealing with submodules, > > isn't it. For example having "git commit --recursive" would help > > with 'having to commit each submodule', though how you would write commit > > messages then: perhaps supermodule commit message could be by default > > composed out of submodules commits (if any). "git push --recursive" > > (or some support for push in "git remote") would help with 'having to > > push each submodule'. > > For "recursive" commit, for my own workflow, I would rather have it work > like this: from the toplevel, I can 'git commit' any set of files, as long > as they all fall inside a particular submodule. That is, if I do > > git commit mod1/*.c mod2/*.c > > it should reject it (with a helpful message), because the commit would cross > submodule boundaries. But if I do > > git commit mod1/*.c > > I think it should create a new commit in mod1, leave my superproject > pointing at that new commit, and stop (ie. without the superproject having > committed the new commit pointer). > > Why? Because my normal workflow is: > > - make a bunch of superproject/submodule changes until they work. > - commit the submodule changes with a submodule-relevant message > - commit the superproject change with a supermodule-relevant message > > I wouldn't want to share commit messages between the two, so actually having > a single commit process be "recursive" would not do me any good. I think it is quite good idea, but it covers only one of the three most common (I think) used versions of git-commit: * git commit <files> # your proposal covers this * git commit -a # but I think either this * git commit # or this is actually more common Also "git commit ." in a submodule cannot be done in this proposal, because it is indistinguishable from "git commit <submodule>" committing state of submodule in supermodule. Perhaps it would be matter of porting "--relative=<path>" or adding "--submodule=<name>" option to git-commit? > However, pushing is a separate issue entirely. Having push be recursive > would be easy, but it doesn't solve the *real* problem with pushing: git > doesn't know what branch to push to in the submodule, and the submodule most > likely isn't pointing at a pushable repo at all, even if the supermodule is. > This is why I keep coming back to the idea that I really want to push all > the submodule objects into the superproject's repo. I think there should be two easy to obtain variants of recursive clone: 1. Current one, where each submodule gets its own repository in the place it is checked out in working area (in worktree) of supermodule. 2. New one, where submodule repositories are in .git/submodules/<name> in supermodule GIT_DIR, and submodules use gitfiles (probably with some notation that path is relative to supermodule, like e.g. //<path> or .../<path>). I'm not sure though how it would translate into pushing... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 22:50 ` Avery Pennarun 2010-07-24 0:58 ` skillzero @ 2010-07-24 20:07 ` Sverre Rabbelier 2010-07-26 8:51 ` Jakub Narebski 2010-07-26 15:15 ` Marc Branchaud 3 siblings, 0 replies; 58+ messages in thread From: Sverre Rabbelier @ 2010-07-24 20:07 UTC (permalink / raw) To: Avery Pennarun Cc: Marc Branchaud, Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds Heya, On Fri, Jul 23, 2010 at 17:50, Avery Pennarun <apenwarr@gmail.com> wrote: > IMHO, the correct answer here is to have an inotify-based daemon prod > at the .git/index automatically when files get updated, so that git > itself doesn't have to stat/readdir through the entire tree in order > to do any of its operations. (Windows also has something like inotify > that would work.) If you had this, then git > status/diff/checkout/commit would be just as fast with zillions of > files as with 10 files. Sooner or later, if nobody implements this, I > promise I'll get around to it since inotify is actually easy to code > for :) From what I've heard both SVN and Mercurial have something like that and it's incredible unstable and icky and nasty and bad and will eat your babies. Then again, I don't have any experience with inotify, so if you say that it's all good and awesome, who am I to doubt that :). -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 22:50 ` Avery Pennarun 2010-07-24 0:58 ` skillzero 2010-07-24 20:07 ` Sverre Rabbelier @ 2010-07-26 8:51 ` Jakub Narebski 2010-07-27 19:15 ` Avery Pennarun 2010-07-26 15:15 ` Marc Branchaud 3 siblings, 1 reply; 58+ messages in thread From: Jakub Narebski @ 2010-07-26 8:51 UTC (permalink / raw) To: Avery Pennarun Cc: Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Sat, 24 Jul 2010 00:50, Avery Pennarun wrote: > On Fri, Jul 23, 2010 at 11:19 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: >> On 10-07-22 03:41 PM, Avery Pennarun wrote: >>> 1) Sometimes I want to clone only some subdirs of a project >>> 2) Sometimes I don't want the entire history because it's too big. >>> 3) Super huge git repositories start to degrade in performance. >> >> The reason we turned to submodules is precisely to deal with repository size. > > I believe that's very common. > > However, I wonder whether that's actually a good reason for git to > develop better submodules, or actually just a good reason for git to > get better support for handling huge repositories. > > My bup project (http://github.com/apenwarr/bup) is all about huge > repositories. It handles repositories with hundreds of gigabytes, and > trees containing millions of files (entire filesystems), quite nicely. > Of course, it's not a version control system, so it won't solve your > problems. It's just evidence that large repositories are actually > quite manageable without changing the fundamentals of git. There is also git-bigfiles project, although it is more about large [binary] files than large repositories per se (many files, long history). Note that with 'bup' you might not see problems with large repositories because it does not examine code paths that are slow in large repositories (gc, log, path-delimited log). >> Our code base encompasses the entire FreeBSD tree plus different versions of >> the Linux kernel, along with various third-party libraries & apps. You don't >> need everything to build a given product (a FreeBSD product doesn't use any >> Linux kernels, for example) but because all the products share common code we >> need to be able to branch and tag the common code along with the uncommon code. Sidenote: I have noticed there very important ability of submodules, which git-subtree lacks, or at least doesn't have it directly, namely ability to tag in submodule separately of tagging superproject as whole (so e.g. superproject v1.6.2 includes subproject 'foo' v0.99 which is foo/v0.99 tag in superproject). >> So a straight "git clone" that would need to fetch all of FreeBSD plus 4 >> different Linux kernels and check all that out is a major problem, especially >> for our automated build system (which could definitely be implemented better, >> but still). > > To be absolutely pedantic, the four linux kernels likely share most of > their objects and so you're only paying the cost (at least during > fetch) of including it once :) > > (If you're actually using git-submodule and each copy of the kernel is > its own module, then it might be cloning the kernel four times > separately, in which case the objects *don't* get shared, so this ends > up being much more expensive than it should be. That could be fixed > by slightly improving git-submodule to share some objects rather than > rearchitecting it though.) This issue is orthogonal to the fact of using submodules, it is a matter of setting up alternates to share object storage. >> In truth it's the checkout that takes the most time by far, >> though commands like git-status also take inconveniently long. > > Yeah, git could stand to be optimized a bit here. And since Windows > stats files about 10x slower than Linux, this problem occurs about 10x > sooner on Windows, which makes using git on Windows (which sadly I > have to do sometimes) extremely painful compared to Linux. > > IMHO, the correct answer here is to have an inotify-based daemon prod > at the .git/index automatically when files get updated, so that git > itself doesn't have to stat/readdir through the entire tree in order > to do any of its operations. (Windows also has something like inotify > that would work.) If you had this, then git > status/diff/checkout/commit would be just as fast with zillions of > files as with 10 files. Sooner or later, if nobody implements this, I > promise I'll get around to it since inotify is actually easy to code > for :) IIUC the problem is that inotify is not automatically recursive, so daemon would have to take care of adding inotify trigger to each newly created subdirectory. > Also note that the only reason submodules are faster here is that > they're ignoring possibly important changes. Notably, when you do > 'git status' from the top level, it won't warn you if you have any > not-yet-committed files in any of your submodules. Personally, I > consider that to be really important information, but to obtain it > would make 'git status' take just as long as without submodules, so > you wouldn't get any benefit. (I think nowadays there's a way to get > this recursive status information if you want it, but it'll be slow of > course.) Errr... didn't it got improved in recent git? I think git-status now includes information about submodules if configured so / unless configured otherwise. Isn't it? >> We chose git-submodule over git-subtree mainly because git-submodule lets us >> selectively checkout different parts of our code. (AFAIK sparse checkouts >> aren't yet an option.) Sparse checkouts are here, IIRC, but they do not solve problem of disk space (they are still in repository, even if not checked out), and speed (they still need to be fetched, even if not checked out). > Fair enough. If you could confirm or deny my theory that this is > *entirely* a performance related concern (as opposed to disk space / > download time), that would be helpful. > >> We didn't really consider git-subtree because it's >> not an official part of git, and we didn't want to have to teach (and nag) >> all our developers to install and maintain it in addition to keeping up with >> git itself. > > Arguably, this is a vote for including git-subtree into the core > (which was Bryan's point when he started this thread); it obviously is > being rejected sometimes by git users simply because it's not in the > core, even though it could help them. Well, patch management interfaces such as StGIT, Guilt and TopGit are also outside git code (and should be), same with GUI tools such as qgit. That shouldn't prevent people from using them ;-) But I am all for having git-subtree in core: we have git-remote, haven't we? Besides git-subtree fits some workflows better than git-submodule (and vice versa). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-26 8:51 ` Jakub Narebski @ 2010-07-27 19:15 ` Avery Pennarun 0 siblings, 0 replies; 58+ messages in thread From: Avery Pennarun @ 2010-07-27 19:15 UTC (permalink / raw) To: Jakub Narebski Cc: Marc Branchaud, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On Mon, Jul 26, 2010 at 4:51 AM, Jakub Narebski <jnareb@gmail.com> wrote: > On Sat, 24 Jul 2010 00:50, Avery Pennarun wrote: >> My bup project (http://github.com/apenwarr/bup) is all about huge >> repositories. It handles repositories with hundreds of gigabytes, and >> trees containing millions of files (entire filesystems), quite nicely. >> Of course, it's not a version control system, so it won't solve your >> problems. It's just evidence that large repositories are actually >> quite manageable without changing the fundamentals of git. > > There is also git-bigfiles project, although it is more about large > [binary] files than large repositories per se (many files, long history). Right. git-bigfiles is valuable, but it's valuable with or without submodules. (If you have large blobs, submodules won't save you.) bup happens to have its own way of dealing with large files too, but it may not be applicable to git. It does result in lots and lots of smaller objects, though, which is why I know git repositories are fundamentally capable of handling lots and lots of smaller objects :) > Note that with 'bup' you might not see problems with large repositories > because it does not examine code paths that are slow in large repositories > (gc, log, path-delimited log). gc is a huge problem. bup avoids it entirely (it foregoes delta compression); git gc fails completely on such large repositories (100+ GB). There's no reason this has to be true forever, but yes, to support really big repos, git gc would need to be improved somewhat. For most reasonably sane repos (a few GB) you can get reasonable performance by just making your biggest packfiles .keep so they don't keep getting repacked all the time. Compared to that, log feels like not a problem at all :) At least performance-wise. The thing that sucks about log using git-subtree, of course, is that you get all these log messages from multiple projects jammed together into a single repo, which is rarely what you want, even if it's fast. I think the "best" solution is a single repo with all your objects, but still keeping the histories of each submodule separate. >> IMHO, the correct answer here is to have an inotify-based daemon prod >> at the .git/index automatically when files get updated, so that git >> itself doesn't have to stat/readdir through the entire tree in order >> to do any of its operations. (Windows also has something like inotify >> that would work.) If you had this, then git >> status/diff/checkout/commit would be just as fast with zillions of >> files as with 10 files. Sooner or later, if nobody implements this, I >> promise I'll get around to it since inotify is actually easy to code >> for :) > > IIUC the problem is that inotify is not automatically recursive, so > daemon would have to take care of adding inotify trigger to each newly > created subdirectory. Yeah, the inotify API is kind of gross that way. But it can be done, and people do. (eg. the beagle project) >> Also note that the only reason submodules are faster here is that >> they're ignoring possibly important changes. Notably, when you do >> 'git status' from the top level, it won't warn you if you have any >> not-yet-committed files in any of your submodules. Personally, I >> consider that to be really important information, but to obtain it >> would make 'git status' take just as long as without submodules, so >> you wouldn't get any benefit. (I think nowadays there's a way to get >> this recursive status information if you want it, but it'll be slow of >> course.) > > Errr... didn't it got improved in recent git? I think git-status now > includes information about submodules if configured so / unless configured > otherwise. Isn't it? Yes, but you're still left with the choice between slow (checks all files in all submodules) and not slow (might miss stuff). This isn't a submodule question, really, it's an overall performance question with huge checkouts with or without submodules. >>> We chose git-submodule over git-subtree mainly because git-submodule lets us >>> selectively checkout different parts of our code. (AFAIK sparse checkouts >>> aren't yet an option.) > > Sparse checkouts are here, IIRC, but they do not solve problem of disk > space (they are still in repository, even if not checked out), and speed > (they still need to be fetched, even if not checked out). Hmm, don't mix bandwidth usage (and thus the slowness of fetch) with slowness during everyday usage. I don't mind a slow fetch now and then, but 'git status' should be fast. AFAIK, sparse checkouts *should* make git status faster. If they don't, it's probably just a bug. Have fun, Avery ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-23 22:50 ` Avery Pennarun ` (2 preceding siblings ...) 2010-07-26 8:51 ` Jakub Narebski @ 2010-07-26 15:15 ` Marc Branchaud 3 siblings, 0 replies; 58+ messages in thread From: Marc Branchaud @ 2010-07-26 15:15 UTC (permalink / raw) To: Avery Pennarun Cc: Jakub Narebski, Jens Lehmann, Ævar Arnfjörð Bjarmason, Bryan Larsen, git, Junio C Hamano, Linus Torvalds On 10-07-23 06:50 PM, Avery Pennarun wrote: > On Fri, Jul 23, 2010 at 11:19 AM, Marc Branchaud <marcnarc@xiplink.com> wrote: >> On 10-07-22 03:41 PM, Avery Pennarun wrote: >>> 1) Sometimes I want to clone only some subdirs of a project >>> 2) Sometimes I don't want the entire history because it's too big. >>> 3) Super huge git repositories start to degrade in performance. >> >> The reason we turned to submodules is precisely to deal with repository size. > > I believe that's very common. > > However, I wonder whether that's actually a good reason for git to > develop better submodules, or actually just a good reason for git to > get better support for handling huge repositories. I think that's a fundamental question, but part of the problem in coming up with an answer is that there's no agreed-upon definition of how to handle huge repos. People have provided tools that answer the question in ways they like, but I think the fact that these issues keep coming up is proof that git isn't there yet. >> Our code base encompasses the entire FreeBSD tree plus different versions of >> the Linux kernel, along with various third-party libraries & apps. You don't >> need everything to build a given product (a FreeBSD product doesn't use any >> Linux kernels, for example) but because all the products share common code we >> need to be able to branch and tag the common code along with the uncommon code. > > Honest question: do you care about the wasted disk space and download > time for these extra files? Or just the fact that git gets slow when > you have them? It's not the disk space or the extra download time. It's how long takes to checkout all those files, and how long it takes to "git status" in a unified repo. >> So a straight "git clone" that would need to fetch all of FreeBSD plus 4 >> different Linux kernels and check all that out is a major problem, especially >> for our automated build system (which could definitely be implemented better, >> but still). > > To be absolutely pedantic, the four linux kernels likely share most of > their objects and so you're only paying the cost (at least during > fetch) of including it once :) That is true, but like I said the problem is the checkout. Our different products use different kernels (or FreeBSD): Product 1 -- Linux vX Product 2 -- Linux vY Product 3 -- FreeBSD (Luckily we're currently only using one version of FreeBSD...) All the products use common code. When we release, we need to tag the common code and the particular Linux kernel (or FreeBSD) we built the product with. We can't stuff all the Linux kernels into a single submodule, because then the repo will be "dirty" if we checkout a different Linux kernel to build a different product. Even in a unified repo we'd need the kernels to live in their own trees. So we've ended up with individual submodules for each Linux kernel, and we've taught our automated build to only clone/checkout the kernel it needs to build the target product. Otherwise the checkout I/O overshadows the actual build time, especially when we try to run several builds in parallel on one slave machine. > (If you're actually using git-submodule and each copy of the kernel is > its own module, then it might be cloning the kernel four times > separately, in which case the objects *don't* get shared, so this ends > up being much more expensive than it should be. That could be fixed > by slightly improving git-submodule to share some objects rather than > rearchitecting it though.) Even with the --reference parameter, it's still a problem. >> In truth it's the checkout that takes the most time by far, >> though commands like git-status also take inconveniently long. > > Yeah, git could stand to be optimized a bit here. And since Windows > stats files about 10x slower than Linux, this problem occurs about 10x > sooner on Windows, which makes using git on Windows (which sadly I > have to do sometimes) extremely painful compared to Linux. > > IMHO, the correct answer here is to have an inotify-based daemon prod > at the .git/index automatically when files get updated, so that git > itself doesn't have to stat/readdir through the entire tree in order > to do any of its operations. (Windows also has something like inotify > that would work.) If you had this, then git > status/diff/checkout/commit would be just as fast with zillions of > files as with 10 files. Sooner or later, if nobody implements this, I > promise I'll get around to it since inotify is actually easy to code > for :) > > Also note that the only reason submodules are faster here is that > they're ignoring possibly important changes. Notably, when you do > 'git status' from the top level, it won't warn you if you have any > not-yet-committed files in any of your submodules. Personally, I > consider that to be really important information, but to obtain it > would make 'git status' take just as long as without submodules, so > you wouldn't get any benefit. (I think nowadays there's a way to get > this recursive status information if you want it, but it'll be slow of > course.) I'm happy with a "git status" that can ignore uninitialized submodules and still probe into initialized/cloned ones. I agree that it's important for "git status" to be correct. >> We chose git-submodule over git-subtree mainly because git-submodule lets us >> selectively checkout different parts of our code. (AFAIK sparse checkouts >> aren't yet an option.) > > Fair enough. If you could confirm or deny my theory that this is > *entirely* a performance related concern (as opposed to disk space / > download time), that would be helpful. Consider it confirmed. Honestly, disk space is a complete non-issue. It's always nice to have faster download times, but it hasn't been an issue for us and there are already several ways to work around it anyway. >> We didn't really consider git-subtree because it's >> not an official part of git, and we didn't want to have to teach (and nag) >> all our developers to install and maintain it in addition to keeping up with >> git itself. > > Arguably, this is a vote for including git-subtree into the core > (which was Bryan's point when he started this thread); it obviously is > being rejected sometimes by git users simply because it's not in the > core, even though it could help them. Yes, I have no objection to seeing git-subtree becoming an official part of git. My only complaint would be that it doesn't really help git deal with huge repos. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Avery Pennarun's git-subtree? 2010-07-21 21:09 ` Avery Pennarun 2010-07-21 21:20 ` Avery Pennarun 2010-07-21 22:46 ` Jens Lehmann @ 2010-07-21 23:46 ` Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 58+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-21 23:46 UTC (permalink / raw) To: Avery Pennarun; +Cc: Bryan Larsen, git, Junio C Hamano On Wed, Jul 21, 2010 at 21:09, Avery Pennarun <apenwarr@gmail.com> wrote: > On Wed, Jul 21, 2010 at 4:36 PM, Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> On Wed, Jul 21, 2010 at 19:56, Avery Pennarun <apenwarr@gmail.com> wrote: >>> No amount of bugfixing in git submodule can fix this workflow, because >>> it's not a result of bugs. (The bugs, particularly the >>> disconnected-by-default HEADs on submodule checkouts, do make it a bit >>> worse :( ) It would require a fundamental redesign to make this work >>> nicely with submodules. >> [...] >> I think most of those can be fixed, actually. The only requirement >> that the git plumbing imposes on git-submodules is that a "commit" >> entry exist in your tree, the rest is just (ugly plumbing). > > Sure. But this commit object (and the objects it points to) are never > automatically pushed, fetched, or fsck'd. They're second class > citizens. As it turns out, this was a major design mistake in > implementing the submodule commit objects. > > All the behaviour people *currently* get from submodules could have > been obtained without using a new 'commit' object type at all. Just > add a commitid to the horrible junk (including repo URLs, argh) that > already needs to get pasted into .gitmodules, and have git-commit at > the top level update .gitmodules automatically (as it currently > updates the 'commit' tree entries). Problem solved (at least, solved > to exactly the extent that it is today). Yeah, that does sound better than the current mess. > What we *really* want is a way to have git actually recurse through > commit objects when doing *any* operation, as if they were tree > objects. If we had that, submodules could be beautiful (because you'd > push them to the same repo, etc and users would see none of the > complexity). But this doesn't exist. And for backward compatibility > at this point, we'd probably need to introduce an entirely new kind of > tree entry to support such a thing. > >> Thus, we could: >> >> * Hack git-submodule (or its replacement) to check import the tree >> that contains that "commit" into one central .git > > This part is relatively easy, I think - at least in concept, although > I bet there would be widespread implementation tweaks - and would > clean up a lot of the mess. However it would require a change to the > .git/index file format to remember when a subdir is a commit and not a > "normal" tree so that it doesn't silently commit the next thing as a > tree instead. > >> * Fix git status / git commit so that you could commit into >> submodules, i.e.: >> >> for each submodule in this-commit: >> chdir $submodule && commit >> done && cd $root && commit -m"bumping submodules" > > After making the earlier change to get rid of the extra .git subdirs, > this next requirement would actually be considerably more work, > because 'git commit' would need to know how to update a subcommit > without changing HEAD. You certainly couldn't just code it up as a > recursive "git commit" as you imply (and as you could do right now). > >> * Make git-push push the submodule contents and the >> superprojects. You'd just need to have commit access to the url >> listed in .gitmodules. > > This is really a *killer* problem, and you're making it sound easy. > Let's imagine that my app has 25 different submodules - not > unreasonable at all in a world with dozens of ever-changing ruby gems > and suchlike. > > Now, if I want to branch my project, I might have to branch 25 > projects just so I can push my changes? It's totally awful. And the > awfulness is multiplied many times over if .gitmodules has hard-coded > repo paths, because then I have to update the repo path in my branch > but not the other branch, and merging will have conflicts. You might > think that my .git/config could just override .gitmodules, but then > some guy trying to fetch my branch will fail to fetch the submodules > from my branch and get errors and have no idea what's going on. > > And you might think that using relative repo paths in .gitmodules > would work, but that's only if I branched all 25 submodules in the > *first* place. In real life, most subprojects point at the original > project's home repo by default (because nobody thinks they'll be > patching 25 subprojects when they start, and they're probably right), > but then you have to individually change the URLs when you decide you > need to patch them, and life gets complicated and ugly, especially > when the next guy goes to fork your project and now needs to fork some > subprojects but not others. > > There is no good solution to the submodule problem if each submodule > has to go in its own repo. I've been thinking about this for years > now, and watching lots of discussions about it on the git mailing > list, and I just can't see any other option. All the submodules have > to get pushed to and fetched from the same repo by default. Anything > else is insane. Yeah, bundling the submodules in the upstream repo so only one person ever has to worry about gathering them up and pushing them to the central repo sounds better for most uses than the current submodule implementation. OTOH, I have some submodules that I track on GitHub that would really inflate the size of the repo that's tracking them. So there are definitely use cases for having the tree somewhere remotely as well, especially for large submodules like game art, which some people have reported submodules for. > One option might be to store the submodule commit refs as refs in your > superproject. That wouldn't actually be so bad, except for the > aforementioned problem that fetch/push/clone/etc don't actually trace > through commit objects when deciding what objects to send you, so > fetching the ref of the superproject wouldn't autofetch the subproject > refs. Also, you could accidentally delete one of the subproject refs > and lose tons of history without ever realizing it. That's error > prone and confusing... and clutters up your repo refs list with > administrative stuff you didn't actually want in the first place. > >> What's missing from that (which would be nice) is the ability to check >> out a subdirectory from another repository. That could (I think) be >> done by just adding a normal "tree" entry, and then specifying that >> that tree can be found in git://... instead of the main tree. > > Actually that's already easy with submodules (and git-subtree makes it > easy too, though slightly different). Just fetch the commit from the > other repo, and do: > > git checkout FETCH_HEAD -- subdirname > >>> If we can get some kind of consensus in principle that git-subtree is >>> a good idea to merge into git core, I can prepare some patches and we >>> can talk about the details. >> >> From having looked at it briefly it looks very nice. But it looks to >> me as if the main differences between git-submodule and git-subtree >> are in the porcelain, not the plumbing. > > No. The fundamental difference is exactly one: git-subtree uses > normal 'tree' entries (rather than commits) in its trees, so that all > the git tools recurse through them like any other tree. Thus you > don't need any extra refs, extra .git dirs, etc. That allows you to > bypass all the useless behaviour git has around 'commit' entries. > This is very much a plumbing difference. > > The git-submodule porcelain happens to independently be kind of > annoying and inconvenient, but that would be much easier to fix if it > weren't for the plumbing-related problems. > >> It would be a lot less confusing to users of Git in the long term if >> we would at least try to unify these two approaches instead of having >> two mutually incompatible ways of doing essentially the same thing. > > True. But I don't have the time, and implementing the new 'commit' > entry semantics sounds like a lot of work (as opposed to arguing about > them, which I guess I'm good at but which seems unproductive). > > In productive terms: git-subtree is solving problems for real users > right now. It might solve more problems for more users if it were > integrated into the core and thus made "official." Nothing precludes > making submodules better later. Sure, don't get me wrong. git-subtree looks very useful, and I have no objection to having it in git.git, and even if it's not optimal for everything good working software now shouldn't be held up by some theoretical pie-in-the-sky system. ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2010-07-28 22:28 UTC | newest] Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-21 17:15 Avery Pennarun's git-subtree? Bryan Larsen 2010-07-21 19:43 ` Ævar Arnfjörð Bjarmason 2010-07-21 19:56 ` Avery Pennarun 2010-07-21 20:36 ` Ævar Arnfjörð Bjarmason 2010-07-21 21:09 ` Avery Pennarun 2010-07-21 21:20 ` Avery Pennarun 2010-07-21 22:46 ` Jens Lehmann 2010-07-22 1:09 ` Avery Pennarun [not found] ` <m31vavn8la.fsf@localhost.localdomain> 2010-07-22 18:23 ` Bryan Larsen 2010-07-24 22:36 ` Jakub Narebski 2010-07-22 19:41 ` Avery Pennarun 2010-07-22 19:56 ` Jonathan Nieder 2010-07-22 20:06 ` Avery Pennarun 2010-07-22 20:17 ` Ævar Arnfjörð Bjarmason 2010-07-22 21:33 ` Avery Pennarun 2010-07-23 15:10 ` Jens Lehmann 2010-07-26 17:34 ` Eugene Sajine 2010-07-22 20:43 ` Elijah Newren 2010-07-22 21:32 ` Avery Pennarun 2010-07-23 8:31 ` Chris Webb 2010-07-23 8:40 ` Avery Pennarun 2010-07-23 15:11 ` Jens Lehmann 2010-07-23 22:33 ` Avery Pennarun 2010-07-23 15:13 ` Jens Lehmann 2010-07-23 15:10 ` Jens Lehmann 2010-07-23 16:05 ` Bryan Larsen 2010-07-23 17:11 ` Jens Lehmann 2010-07-23 19:01 ` Bryan Larsen 2010-07-23 22:32 ` Avery Pennarun 2010-07-25 19:57 ` Jens Lehmann 2010-07-27 18:40 ` Avery Pennarun 2010-07-27 21:14 ` Jens Lehmann 2010-07-23 15:19 ` Marc Branchaud 2010-07-23 22:50 ` Avery Pennarun 2010-07-24 0:58 ` skillzero 2010-07-24 1:20 ` Avery Pennarun 2010-07-24 19:40 ` skillzero 2010-07-25 1:47 ` Nguyen Thai Ngoc Duy 2010-07-28 22:27 ` Jakub Narebski 2010-07-26 13:13 ` Jakub Narebski 2010-07-26 16:37 ` Marc Branchaud 2010-07-26 16:41 ` Linus Torvalds 2010-07-26 17:36 ` Bryan Larsen 2010-07-26 17:48 ` Linus Torvalds 2010-07-27 18:28 ` Avery Pennarun 2010-07-27 20:25 ` Junio C Hamano 2010-07-27 20:57 ` Avery Pennarun 2010-07-27 21:14 ` Junio C Hamano 2010-07-27 21:32 ` Jens Lehmann 2010-07-26 8:56 ` Jakub Narebski 2010-07-27 18:36 ` Avery Pennarun 2010-07-28 13:36 ` Marc Branchaud 2010-07-28 18:32 ` Jakub Narebski 2010-07-24 20:07 ` Sverre Rabbelier 2010-07-26 8:51 ` Jakub Narebski 2010-07-27 19:15 ` Avery Pennarun 2010-07-26 15:15 ` Marc Branchaud 2010-07-21 23:46 ` Ævar Arnfjörð Bjarmason
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.