* Empty directories... @ 2007-07-18 0:13 David Kastrup 2007-07-18 0:35 ` Johannes Schindelin ` (3 more replies) 0 siblings, 4 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 0:13 UTC (permalink / raw) To: git GIT(7) -- 03/05/2007 NAME git - the stupid content tracker Well, I use git for tracking contents. That means, for example, installation trees for some application. Let's take a typical TeXlive tree as an example. Those trees contain, among other things, directories where new fonts/formats/whatever get placed as things run. Quite a few of them start out empty, but their permissions have to correspond to their purpose (for example, some are world-writable). I see little chance to get this achieved without doing something like find -type d -empty -execdir touch {}/.git-this-is-empty + before every checkin and find -name .git-this-is-empty -exec rm -- {} + after every checkout. Which is pretty stupid. As some anecdotal stuff, I did something like mkdir test cd test git-init touch README git-add README # another peeve: why is no empty reference point possible? git-commit -a -m "Initial branch" git checkout -b newbranch master unzip ../somearchive -d subdir git add subdir git commit -a -m "Add subdir" git checkout -b newbranch2 master and expect to have a clean slate. No such luck: without warning, all empty directories in the zip file are still remaining within subdir, which as a consequence has not been cleaned up. So even if one is of the opinion that empty directories are not worth putting into the repository: if I check in an entire subdirectory hierarchy and then switch to a branch where this subdirectory is not existent, I expect the subdirectory to be _gone_, and not have some littering of empty directories lying around. And that git-diff can see nothing wrong with that does not really improve things. So if git is supposed to be a content tracker, I can't see a way around it actually being able to track content, and empty directories _are_ content. It can't let them flying around with arbitrary permissions on them when I switch branches or tags. And the workaround using "touch" mentioned above is really awful to do manually all the time. Could git technically track a file with a zero-length filename in empty directories if one tells it explicitly to include it, like with git-add \! -x "" subdir or has somebody a better idea or interface or rationale? I understand that there are use cases where one does not bother about empty directories, but for a _content_ tracker, not tracking directories because they are empty seems quite serious. Ok, kill me. This must likely be the most common FAQ/rant/whatever concerning git. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:13 Empty directories David Kastrup @ 2007-07-18 0:35 ` Johannes Schindelin 2007-07-18 6:07 ` David Kastrup 2007-07-18 0:39 ` Matthieu Moy ` (2 subsequent siblings) 3 siblings, 1 reply; 137+ messages in thread From: Johannes Schindelin @ 2007-07-18 0:35 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Wed, 18 Jul 2007, David Kastrup wrote: > This must likely be the most common FAQ/rant/whatever concerning git. If you had the idea already, I wonder why you did not find it. It's not really anything like hard to find: http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9 Ciao, Dscho ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:35 ` Johannes Schindelin @ 2007-07-18 6:07 ` David Kastrup 2007-07-18 10:26 ` Johannes Schindelin 2007-07-18 16:23 ` Linus Torvalds 0 siblings, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 6:07 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Wed, 18 Jul 2007, David Kastrup wrote: > >> This must likely be the most common FAQ/rant/whatever concerning git. > > If you had the idea already, I wonder why you did not find it. It's not > really anything like hard to find: > > http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9 The FAQ answer is weazeling on several accounts: a) No, git only cares about files, or rather git tracks content and empty directories have no content. In the same manner as empty regular files have no contents, and git tracks those. Existence and permissions are important. b) The problem is not just that empty directories don't get added into the repository. They also don't get removed again when switching to a different checkout. When git-diff returns zero, I expect a subsequent checkout to not leave complete empty hierarchies around because git can't delete any empty leaves which it chose not to track. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 6:07 ` David Kastrup @ 2007-07-18 10:26 ` Johannes Schindelin [not found] ` <86tzs2m1h7.fsf@lola.quinscape.zz> 2007-07-18 16:23 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: Johannes Schindelin @ 2007-07-18 10:26 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Wed, 18 Jul 2007, David Kastrup wrote: > The FAQ answer is weazeling on several accounts: > > a) No, git only cares about files, or rather git tracks content and > empty directories have no content. > > In the same manner as empty regular files have no contents, and git > tracks those. Existence and permissions are important. We do not track permissions of directories at all. This is because Git is primarily meant to track source code, and most "permissions" (i.e. restrictions) do not make any sense there. > b) The problem is not just that empty directories don't get added into > the repository. They also don't get removed again when switching to a > different checkout. When git-diff returns zero, I expect a subsequent > checkout to not leave complete empty hierarchies around because git > can't delete any empty leaves which it chose not to track. I _like_ the behaviour that Git does not remove a directory it added, when I put some untracked file into it. And switching back to that branch, Git has no problems, because it sees that the directory is already there. In case of a file, it would complain, and rightfully so. See the fundamental difference between a file and a directory now? I think it boils down to "an empty directory has _no_ contents, but an empty file has an _empty_ content". Ciao, Dscho ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <86tzs2m1h7.fsf@lola.quinscape.zz>]
* Re: Empty directories... [not found] ` <86tzs2m1h7.fsf@lola.quinscape.zz> @ 2007-07-18 11:24 ` Johannes Schindelin 2007-07-18 11:40 ` Matthieu Moy 0 siblings, 1 reply; 137+ messages in thread From: Johannes Schindelin @ 2007-07-18 11:24 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Wed, 18 Jul 2007, David Kastrup wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Wed, 18 Jul 2007, David Kastrup wrote: > > > >> The FAQ answer is weazeling on several accounts: > >> > >> a) No, git only cares about files, or rather git tracks content and > >> empty directories have no content. > >> > >> In the same manner as empty regular files have no contents, and git > >> tracks those. Existence and permissions are important. > > > > We do not track permissions of directories at all. > > Ok, this seems like something that should be done as well, even if we > can stipulate at first that a directory should have rwx for the user > in question if you hope to track it. No, no, no. It should not be tracked. It is the responsibility of the _user_ to set it to something sane, be that by a umask or by sticky groups, or by setting the permissions of the parent directory. It is _nothing_ we want to put into the repository. That is the _wrong_ place to put it. > > This is because Git is primarily meant to track source code, > > Tell that to the man page. It declares git to be "a content tracker" > right at the front. Why don't you? I have no problems with the title. > > and most "permissions" (i.e. restrictions) do not make any sense > > there. > > So why are permissions for files being tracked, then? This question is invalid. Git only tracks the _executable_ bit. And again, it is the users' responsibility, by setting the umask, to have the appropriate bits set for group and others. > >> b) The problem is not just that empty directories don't get added > >> into the repository. They also don't get removed again when > >> switching to a different checkout. When git-diff returns zero, I > >> expect a subsequent checkout to not leave complete empty hierarchies > >> around because git can't delete any empty leaves which it chose not > >> to track. > > > > I _like_ the behaviour that Git does not remove a directory it > > added, when I put some untracked file into it. > > But it does not remove a directory it _refused_ to add when there were > no files at all in it ever. You probably have not read the problem > description carefully. I have. But that does not apply here, because I used the term "to add a directory" in the sense of "mkdir". > > And switching back to that branch, Git has no problems, because it > > sees that the directory is already there. In case of a file, it would > > complain, and rightfully so. > > And if you switch to a branch where the directory it did not remove now > is a file? Git already throws an error, and rightfully so. I am pleased by the current behaviour. > > See the fundamental difference between a file and a directory now? > > Condescension is not really solving a problem. Hey, I only tried to help clarify things. But since I seem to be unable to, I'll end my efforts with this suggestion: If you want to track empty directories, the best thing would be to - teach git-add to automatically create an empty .gitignore (and error out if that already exists), and - teach git-archive to not put .gitignore files into the output by default (but the directories). This might be a sensible change regardless if you want to add empty directories to the repository or not. Ciao, Dscho ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 11:24 ` Johannes Schindelin @ 2007-07-18 11:40 ` Matthieu Moy 2007-07-18 12:12 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Matthieu Moy @ 2007-07-18 11:40 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> > We do not track permissions of directories at all. >> >> Ok, this seems like something that should be done as well, even if we >> can stipulate at first that a directory should have rwx for the user >> in question if you hope to track it. > > No, no, no. It should not be tracked. It is the responsibility of the > _user_ to set it to something sane, be that by a umask or by sticky > groups, or by setting the permissions of the parent directory. > > It is _nothing_ we want to put into the repository. That is the _wrong_ > place to put it. I'm not sure it's wrong to be able to track permissions, but it's definitely wrong to track them by default. GNU Arch had some permission tracking, and I got hit by it several times. You have several things you might have wanted to track: * read/write for the user. But I can't imagine a case where you wouldn't want to be able to read and write your own files. * permissions for group. But that doesn't make any sense when several persons work on the same project, and don't share the same /etc/group. * permissions for others. But that, again, doesn't make sense when several persons work on the same project with different setups. I sometimes work at home, where I'm basically the only user, I don't care at all about permissions for others. At work, it's totally different, since it's a big NFS shared by all the lab. And I might very well disclose my work to the rest of the lab, and work with someone who do not want to do so. * Execute bit. This one is relevant. Indeed, it's more a kind of metadata than really a permission (you can still execute the file with /lib/ld-linux.so.2 /path/to/file or such kind of things). Using GNU Arch, I got the cases in real life of a project in which some files had group read permission, some other not, because they were created by developers having different umask. Worse than this, I got some group-writable files in my $HOME without noticing it, which is basically a security hole. -- Matthieu ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 11:40 ` Matthieu Moy @ 2007-07-18 12:12 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 12:12 UTC (permalink / raw) To: git Matthieu Moy <Matthieu.Moy@imag.fr> writes: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >>> > We do not track permissions of directories at all. >>> >>> Ok, this seems like something that should be done as well, even if we >>> can stipulate at first that a directory should have rwx for the user >>> in question if you hope to track it. >> >> No, no, no. It should not be tracked. It is the responsibility of the >> _user_ to set it to something sane, be that by a umask or by sticky >> groups, or by setting the permissions of the parent directory. >> >> It is _nothing_ we want to put into the repository. That is the _wrong_ >> place to put it. > > I'm not sure it's wrong to be able to track permissions, but it's > definitely wrong to track them by default. I am not sure about "definitely", but there certainly are applications where it is appropriate. > * Execute bit. This one is relevant. Indeed, it's more a kind of > metadata than really a permission (you can still execute the file > with /lib/ld-linux.so.2 /path/to/file or such kind of things). Please spare us the sophistry. Probably the most flexible approach would be to be able to specify a checkout umask, defaulting to 700 (the other bits are then filled in from the normal user umask). For archival purposes, one would then set it to 777 instead. There is the question how to deal with checkins. While there is no harm in checking in the full permissions in case one would need them, it would likely be a nuisance to track the individual contributor's settings. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 6:07 ` David Kastrup 2007-07-18 10:26 ` Johannes Schindelin @ 2007-07-18 16:23 ` Linus Torvalds 2007-07-18 16:33 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 16:23 UTC (permalink / raw) To: David Kastrup; +Cc: Johannes Schindelin, git On Wed, 18 Jul 2007, David Kastrup wrote: > > In the same manner as empty regular files have no contents, and git > tracks those. Existence and permissions are important. Yes, but directories really are different. First off, git wouldn't track the permissions anyway (git tracks execute bits, but for directories that _has_ to be set or git couldn't use them itself, so that's not going to happen). Second, and much more important, the directories will exist or not *regardless* of what git does. > b) The problem is not just that empty directories don't get added into > the repository. They also don't get removed again when switching to a > different checkout. Bzzt. Wrong. We *do* remove directories when all files under them go away. HOWEVER (and this is where one of the reasons for not tracking them comes in): ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS ** Think about that for five seconds, then think about it some more. Ponder it. So the fact is, git *already* does ass good of a job as it could possibly do wrt directories that go away: it tries to remove them if all the files that are tracked in it have gone away. But that leaves a very common case, namely switching to another branch without those files, and the directory still having stale object files etc build crud in it. A SCM *must*not* just remove that directory. It would be horrible. The fact that it has untracked files in it does not make those untracked files "unimportant". Maybe you feel that way about object files, but what about tracking some important parts of your home directory - does the fact that you don't necessarily track *all* of it mean that the rest is totally unimportant adn that git should just remove it? HELL NO! So directories really _are_ problematic. You cannot (and should not) track them the same way as you track a file. And the difference is very fundamental indeed: when you track a regular file, you track *all* of its content. But when you track a directory, you don't track it's content *at*all*. Think about that, and then think about the fact that git is defined as a "content tracker", and it's not "weasely" at all to say that you don't track directories. So your argument is totally bogus. When you track an empty file, you very much track the *content* of that file, and "empty" just happens to be a very valid content. But when you track a "directory", you don't actually track its content at all, you track it's *existence*, which is a very very very different thing. I hope you understand from the above what is so different. (A true "directory content" tracker by definition would have to track every single file under that directory. You can claim that for the case of an empty directory the "existence tracking" is 100% equivalent with "content tracking", but that's simply not true. It becomes non-true the moment there are any files at all inside that directory, and be honest now: the only _point_ of an empty directory is that you expect it to potentially get files under it). So "existence" != "content". Git very much does not track "existence" of files, it tracks the total content of them too. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 16:23 ` Linus Torvalds @ 2007-07-18 16:33 ` Linus Torvalds 2007-07-18 17:38 ` David Kastrup 2007-07-18 16:39 ` Matthieu Moy 2007-07-18 17:34 ` David Kastrup 2 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 16:33 UTC (permalink / raw) To: David Kastrup; +Cc: Johannes Schindelin, git On Wed, 18 Jul 2007, Linus Torvalds wrote: > > So "existence" != "content". Git very much does not track "existence" of > files, it tracks the total content of them too. Btw, don't get me wrong: I think that in order to be better at tracking other SCM's idiotic choices, we could (and I foresee that we eventually have to) try to track empty directories as a special case too. So I'm not _against_ the notion of tracking empty directories, and I would welcome patches that do so. As I mentioned in some earlier thread when this came up a few weeks ago, I actually suspect that the "subproject" support probably ended up making it easier, because in many ways an "empty directory" is very close to a "anonymous subproject" from a low-level plumbing standpoint (even if it is *not* so from a high-level standpoint). So I suspect that adding support for empty directories ends up being about just slightly extending the places that now have subproject support to know about a new situation. But I do want to point out that "tracking a directory" is not at all the same thing as "tracking a file", no matter how much you try to argue otherwise. The semantics are totally different, and it all boils down to the fact that when you track a file, you are always talking about the *full* content of the file, while tracking a directory is always about tracking just a *subset* of the contents of the directory. Of course, with directories, there's the trivial case where the subset happens to be everything, but that is neither the common nor the interesting case. All the interesting and complex cases happen exactly when the directory has untracked files in it, and at that point - you really aren't tracking "contents" any more - you can no longer recreate the directory from the data you have (so you cannot remove it on branch switches etc) - ergo: you're not a content tracker any more, you're a "container" tracker. And really, the "nontracked files in a directory" is the *default* thing, not some really unusual thing that we could disallow. But I'm not against adding support for "container tracking". I just want people to understand that it's something totally different from what we do now. It's much more like subproject support than tracking files. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 16:33 ` Linus Torvalds @ 2007-07-18 17:38 ` David Kastrup 2007-07-18 18:05 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-18 17:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > But I do want to point out that "tracking a directory" is not at all > the same thing as "tracking a file", no matter how much you try to > argue otherwise. Since I did not try to argue this, could you beat another strawman? I have seen this prepackaged rant already, but it does not really address the problem I have been experiencing. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 17:38 ` David Kastrup @ 2007-07-18 18:05 ` Linus Torvalds 0 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 18:05 UTC (permalink / raw) To: David Kastrup; +Cc: git On Wed, 18 Jul 2007, David Kastrup wrote: > > Since I did not try to argue this, could you beat another strawman? How about a bit of honesty? Here's the quote: "The FAQ answer is weazeling on several accounts: a) No, git only cares about files, or rather git tracks content and empty directories have no content. In the same manner as empty regular files have no contents, and git tracks those. Existence and permissions are important." You called it "weaselly" to say that git tracks only content, and then very much tried to equate "existence and permissions" with content. That's the part I answered. So it wasn't a strawman, it was a direct answer to your assertion. Now go away and either come back with the patch to implement it (that I have encouraged you to do), or add a ".gitignore" file to the directory (that others have told you will solve your problems). Don't bother talking crap. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 16:23 ` Linus Torvalds 2007-07-18 16:33 ` Linus Torvalds @ 2007-07-18 16:39 ` Matthieu Moy 2007-07-18 17:06 ` Linus Torvalds 2007-07-18 17:34 ` David Kastrup 2 siblings, 1 reply; 137+ messages in thread From: Matthieu Moy @ 2007-07-18 16:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Kastrup, Johannes Schindelin, git Linus Torvalds <torvalds@linux-foundation.org> writes: >> b) The problem is not just that empty directories don't get added into >> the repository. They also don't get removed again when switching to a >> different checkout. > > Bzzt. Wrong. > > We *do* remove directories when all files under them go away. > > HOWEVER (and this is where one of the reasons for not tracking them comes > in): > > ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS ** I believe David's point was different. If you checkout a branch, create an empty directory in this branch (probably a placeholder, either for future versionned files, or for generated files), you cannot tell git "this empty directory is in this branch, but not in other ones" without adding a file in it. So, doing "git-checkout anotherbranch", this empty directory doesn't go away. It's just unversionned in both branches, git won't touch it. -- Matthieu ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 16:39 ` Matthieu Moy @ 2007-07-18 17:06 ` Linus Torvalds 2007-07-18 21:37 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 17:06 UTC (permalink / raw) To: Matthieu Moy; +Cc: David Kastrup, Johannes Schindelin, git On Wed, 18 Jul 2007, Matthieu Moy wrote: > > If you checkout a branch, create an empty directory in this branch > (probably a placeholder, either for future versionned files, or for > generated files), you cannot tell git "this empty directory is in this > branch, but not in other ones" without adding a file in it. Right. Which is the suggested setup: add an empty ".gitignore" file to the directory, and you're done. It now acts "as if" git tracked the directory (git will remove the directory when switching branches), but without the lie that we really track any directory contents. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 17:06 ` Linus Torvalds @ 2007-07-18 21:37 ` David Kastrup 2007-07-18 21:45 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-18 21:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matthieu Moy, Johannes Schindelin, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 18 Jul 2007, Matthieu Moy wrote: >> >> If you checkout a branch, create an empty directory in this branch >> (probably a placeholder, either for future versionned files, or for >> generated files), you cannot tell git "this empty directory is in this >> branch, but not in other ones" without adding a file in it. > > Right. Which is the suggested setup: add an empty ".gitignore" file > to the directory, and you're done. That implies that every directory in a versioned tree will exclusively be created under manual and conscious control. Not by running some installer or script, unpacking some archive and so on. But if every content on a disk was created and put there under manual control of the disk owner, we could still get along with floppy disks quite fine. In practice, much more content gets sent around and juggled than what is under immediate supervision of the user. This is getting silly: you don't need to pull out rabbits out of your head. You said that you are not inclined to do any work in that area since it does not touch _your_ use cases (well, at least not to a degree that you consider worth bothering about) but that is no reason to get into ridiculous arguments about other usage. No code will come of that. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 21:37 ` David Kastrup @ 2007-07-18 21:45 ` Linus Torvalds 2007-07-18 23:13 ` David Kastrup 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 21:45 UTC (permalink / raw) To: David Kastrup; +Cc: Matthieu Moy, Johannes Schindelin, git On Wed, 18 Jul 2007, David Kastrup wrote: > > You said that you are not inclined to do any work in that area > since it does not touch _your_ use cases (well, at least not to a > degree that you consider worth bothering about) but that is no reason > to get into ridiculous arguments about other usage. How hard is it for you to admit that I also said "please send in a patch". I don't need it. You do. You do the work. I'm just explaining why the work hasn't been done. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 21:45 ` Linus Torvalds @ 2007-07-18 23:13 ` David Kastrup 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 23:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Matthieu Moy, Johannes Schindelin, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 18 Jul 2007, David Kastrup wrote: >> >> You said that you are not inclined to do any work in that area >> since it does not touch _your_ use cases (well, at least not to a >> degree that you consider worth bothering about) but that is no reason >> to get into ridiculous arguments about other usage. > > How hard is it for you to admit that I also said "please send in a > patch". Yup, that was one sentence in about 5 pages of bile. In contrast, Junio gave a good overview of the technical areas involved here, and estimates about what to do there best. That's a constructive way to encite somebody to delve into the task and try to see whether he can come up with something. But 5 pages of what amounts to "you are an idiot, come up with a patch" is not leading anywhere. > I don't need it. You do. You do the work. I'm just explaining why > the work hasn't been done. No, you are _defending_ why the work has not been done. This rationalizing around the bush is a waste of time. You probably have spent quite more time with your venting than Junio did with his technical analysis, and the latter has been much more helpful. So why waste all that time and adrenaline on something where you have already said all you consider relevant? The arguments don't get any stronger by shouting, and it is not like you are inconvenienced in any manner if somebody takes a look at the matter. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* [RFC PATCH] Re: Empty directories... 2007-07-18 21:45 ` Linus Torvalds 2007-07-18 23:13 ` David Kastrup @ 2007-07-18 23:16 ` Linus Torvalds 2007-07-18 23:40 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 23:16 UTC (permalink / raw) To: Junio C Hamano, David Kastrup Cc: Matthieu Moy, Johannes Schindelin, Git Mailing List Gaah. I'm a damn softie (and soft in the head too, for writing the code). Ok, here's a trivial patch to start the ball rolling. I'm really not interested in taking this patch any further personally, but I'm hoping that maybe it can make somebody else who is actually _interested_ in trackign empty directories (hint hint) decide that it's a good enough start that they can fill in the details. This really updates three different areas, which are nicely separated into three different files, so while it's one single patch, you can actually follow along the changes by just looking at the differences in each file, which directly translate to separate conceptual changes: - builtin-update-index.c This simply contains the changes to update the index file. As usual, there are multiple different cases, and they boil down to: (a) No index entry existed at all previously. If so, a directory will first go through the "index_path()" logic, which tries to create a GITLINK entry for it, if the subdirectory is a git directory. However, the new thing is that if that fails, it will instead just create a fake empty tree entry for it, and set the index mode to S_IFDIR. (b) It was a gitlink entry before. It stays as a gitlink entry, even if it cannot be indexed, and a file/symlink entry in the working tree is a conflict error. (c) It was a empty directory entry before. A directory stays as an empty directory entry, and a file/symlink entry in the working tree is a conflict error. Somebody should check that we properly delete the directory entry if we add a file under it, I honestly didn't bother to go through all the logic. I *think* we do it correctly just thanks to all the previous code for gitlinks. Whatever. What I'm trying to say is that the changes are fairly straightforward, but if somebody decides to push this, they need to think about it a lot more than I'm ready to right now. - read-cache.c: match the new index type with the filesystem. This is pretty damn obvious. A S_ISDIR() always matches, and nothing else matches at all. - unpack-trees.c: unpack empty directories not by unpacking them recursively into the index, but by adding them directly to the index as a S_IFDIR entry instead. This one almost certainly needs more work, in particular when merging trees where one has an empty directory, and the other has files _in_ that directory! But the trivial approach makes a simple "git read-tree" with an empty directory unpack it into the index as a S_IFDIR entry, so now doing git-write-tree + git-read-tree should result in the original index contents. I think the patch itself is pretty simple, but the subtle interactions that flow out of this all are anything but. It may "just work" almost as-is, but quite frankly, I think people need to think about all the issues that can happen a lot! So see this as a basis for further work. The "further work" may be pretty simple, or it may not be. I'm personally not that interested, but like my original "subprojects" series, hopefully somebody else ends up running with this (or alternatively just proving that trying to track empty directories is a total nightmare). Linus --- builtin-update-index.c | 33 +++++++++++++++++++++++---------- read-cache.c | 4 ++++ unpack-trees.c | 12 +++++++++--- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/builtin-update-index.c b/builtin-update-index.c index 509369e..2eb2a46 100644 --- a/builtin-update-index.c +++ b/builtin-update-index.c @@ -94,8 +94,16 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru fill_stat_cache_info(ce, st); ce->ce_mode = ce_mode_from_stat(old, st->st_mode); - if (index_path(ce->sha1, path, st, !info_only)) - return -1; + if (index_path(ce->sha1, path, st, !info_only)) { + /* + * If we weren't able to index the directory as a GITLINK, + * see if we can just add it as a plain directory instead. + */ + if (!S_ISDIR(st->st_mode)) + return -1; + ce->ce_mode = htonl(S_IFDIR); + pretend_sha1_file(NULL, 0, OBJ_TREE, ce->sha1); + } option = allow_add ? ADD_CACHE_OK_TO_ADD : 0; option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0; if (add_cache_entry(ce, option)) @@ -134,6 +142,11 @@ static int process_directory(const char *path, int len, struct stat *st) /* Exact match: file or existing gitlink */ if (pos >= 0) { struct cache_entry *ce = active_cache[pos]; + + /* Was it a directory before? */ + if (S_ISDIR(ntohl(ce->ce_mode))) + return 0; + if (S_ISGITLINK(ntohl(ce->ce_mode))) { /* Do nothing to the index if there is no HEAD! */ @@ -162,12 +175,8 @@ static int process_directory(const char *path, int len, struct stat *st) return error("%s: is a directory - add individual files instead", path); } - /* No match - should we add it as a gitlink? */ - if (!resolve_gitlink_ref(path, "HEAD", sha1)) - return add_one_path(NULL, path, len, st); - - /* Error out. */ - return error("%s: is a directory - add files inside instead", path); + /* No match - try to just add it as-is */ + return add_one_path(NULL, path, len, st); } /* @@ -178,8 +187,12 @@ static int process_file(const char *path, int len, struct stat *st) int pos = cache_name_pos(path, len); struct cache_entry *ce = pos < 0 ? NULL : active_cache[pos]; - if (ce && S_ISGITLINK(ntohl(ce->ce_mode))) - return error("%s is already a gitlink, not replacing", path); + if (ce) { + if (S_ISGITLINK(ntohl(ce->ce_mode))) + return error("%s is already a gitlink, not replacing", path); + if (S_ISDIR(ntohl(ce->ce_mode))) + return error("%s is already a directory entry, not replacing", path); + } return add_one_path(ce, path, len, st); } diff --git a/read-cache.c b/read-cache.c index a363f31..d3d2cc0 100644 --- a/read-cache.c +++ b/read-cache.c @@ -142,6 +142,10 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st) (has_symlinks || !S_ISREG(st->st_mode))) changed |= TYPE_CHANGED; break; + case S_IFDIR: + if (!S_ISDIR(st->st_mode)) + changed |= TYPE_CHANGED; + return changed; case S_IFGITLINK: if (!S_ISDIR(st->st_mode)) changed |= TYPE_CHANGED; diff --git a/unpack-trees.c b/unpack-trees.c index 89dd279..22e452b 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -181,9 +181,13 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len, any_dirs = 1; parse_tree(tree); subposns[i] = create_tree_entry_list(tree); - posns[i] = posns[i]->next; - src[i + o->merge] = o->df_conflict_entry; - continue; + + /* If it wasn't empty, recurse into it */ + if (subposns[i]) { + posns[i] = posns[i]->next; + src[i + o->merge] = o->df_conflict_entry; + continue; + } } if (!o->merge) @@ -197,6 +201,8 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len, ce = xcalloc(1, ce_size); ce->ce_mode = create_ce_mode(posns[i]->mode); + if (posns[i]->directory) + ce->ce_mode = htonl(S_IFDIR); ce->ce_flags = create_ce_flags(baselen + pathlen, ce_stage); memcpy(ce->name, base, baselen); ^ permalink raw reply related [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds @ 2007-07-18 23:40 ` Linus Torvalds 2007-07-18 23:42 ` David Kastrup 2007-07-21 4:29 ` David Kastrup 2 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-18 23:40 UTC (permalink / raw) To: Junio C Hamano, David Kastrup Cc: Matthieu Moy, Johannes Schindelin, Git Mailing List On Wed, 18 Jul 2007, Linus Torvalds wrote: > > + if (!S_ISDIR(st->st_mode)) > + return -1; > + ce->ce_mode = htonl(S_IFDIR); > + pretend_sha1_file(NULL, 0, OBJ_TREE, ce->sha1); Oh, one word of warning: that whole "pretend_sha1_file()" thing won't create the object itself, and when I did the limited testing that I did, I actually made sure had a magic zero-sized tree object in my object directory. If you don't, some things will complain, because they end up getting a SHA1 that they cannot look up, becasue *they* didn't create that pretend entry. I didn't know which way I wanted to go with that thing. I was kind of thinking that maybe we would just have the zero-sized OBJ_BLOB and OBJ_TREE objects as special magical things, and have all git programs just do that "pretend" at the beginning. But that kind of thing is probably just a totally unnecessary special case, and instead, that "pretend_sha1_file()" should have just been a write_sha1_file(NULL, 0, "tree", ce->sha1); instead. Anyway, if there are issues with not finding an object called 4b825dc642cb6eb9a060e54bf8d69288fbee4904, then that's the empty tree object, and that pretend thing was the cause. (The git repo itself has the empty tree as an object in it, because one of the commits has that - probably as a result of a bug, but there you have it) Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds 2007-07-18 23:40 ` Linus Torvalds @ 2007-07-18 23:42 ` David Kastrup 2007-07-19 0:22 ` Linus Torvalds [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 2007-07-21 4:29 ` David Kastrup 2 siblings, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 23:42 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Matthieu Moy, Johannes Schindelin, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > Gaah. > > I'm a damn softie (and soft in the head too, for writing the code). > > Ok, here's a trivial patch to start the ball rolling. I'm really not > interested in taking this patch any further personally, but I'm hoping > that maybe it can make somebody else who is actually _interested_ in > trackign empty directories (hint hint) decide that it's a good enough > start that they can fill in the details. Well, kudos. Together with the analysis from Junio, this seems like a good start. Would you have any recommendations about what stuff one should really read in order to get up to scratch about git internals? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-18 23:42 ` David Kastrup @ 2007-07-19 0:22 ` Linus Torvalds 2007-07-19 5:28 ` Junio C Hamano [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 1 sibling, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-19 0:22 UTC (permalink / raw) To: David Kastrup Cc: Junio C Hamano, Matthieu Moy, Johannes Schindelin, Git Mailing List On Thu, 19 Jul 2007, David Kastrup wrote: > > Well, kudos. Together with the analysis from Junio, this seems like a > good start. Would you have any recommendations about what stuff one > should really read in order to get up to scratch about git internals? Well, you do need to understand the index. That's where all the new subtlety happens. The data structures themselves are trivial, and we've supported empty trees (at the top level) from the beginning, so that part is not anything new. However, now having a new entry type in the index (S_IFDIR) means that anything that interacts with the index needs to think twice. But a lot of that is just testing what happens, and so the first thing to do is to have a test-suite. There's also the question about how to show an empty tree in a diff. We've never had that: the only time we had empty trees was when we compared a totally empty "root" tree against another tree, and then it was obvious. But what if the empty tree is a subdirectory of another tree - how do you express that in a diff? Do you care? Right now, since we always recurse into the tree (and then not find anything), empty trees will simply not show up _at_all_ in any diffs. And what about usability issues elsewhere? With my patch, doing something like a git add directory/ still won't do anything, because the behaviour of "git add" has always been to recurse into directories. So to add a new empty directory, you'd have to do git update-index --add directory and that's not exactly user-friendly. So do you add a "-n" flag to "git add" to tell it to not recurse? Or do you always recurse, but then if you notice that the end result is empty, you add it as a directory? All the logic for that whole directory lookup is in git/dir.c, and that code takes various flags because different programs want different things (show "ignored" files, or ignore them? Show empty directories or ignore them? etc). So primarily, I think the job is: - thinking about the index, and the interactions when adding a directory or adding files under a directory that already exists. I *think* we get all the corner cases right, because they should be exactly the same as with subprojects, but hey, maybe there's some piece that tests S_ISGITLINK() and now needs a S_ISDIR() test too.. - adding test cases - thinking about the user interfaces for this, and adding code to handle directories where needed (eg the above "git add" issue). - thinking about merges (which is largely about the index too, but is a whole 'nother set of issues, with multiple stages in the same index at the same time) It might all be trivial. The directory traversal already knows that empty directories are special, so getting the right behaviour to "git add" may be really really easy. Or maybe it's not. I think a lot of it is just finding what needs to be done, seeign if we already do it, and if not, seeign how to do it. Boring test-cases, in other words. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 0:22 ` Linus Torvalds @ 2007-07-19 5:28 ` Junio C Hamano 2007-07-19 5:38 ` Shawn O. Pearce 2007-07-19 5:59 ` David Kastrup 0 siblings, 2 replies; 137+ messages in thread From: Junio C Hamano @ 2007-07-19 5:28 UTC (permalink / raw) To: Linus Torvalds Cc: David Kastrup, Matthieu Moy, Johannes Schindelin, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 19 Jul 2007, David Kastrup wrote: >> >> Well, kudos. Together with the analysis from Junio, this seems like a >> good start. Would you have any recommendations about what stuff one >> should really read in order to get up to scratch about git internals? > > Well, you do need to understand the index. That's where all the new > subtlety happens. > > The data structures themselves are trivial, and we've supported empty > trees (at the top level) from the beginning, so that part is not anything > new. > > However, now having a new entry type in the index (S_IFDIR) means that > anything that interacts with the index needs to think twice. But a lot of > that is just testing what happens, and so the first thing to do is to have > a test-suite. > > There's also the question about how to show an empty tree in a diff. We've > never had that: the only time we had empty trees was when we compared a > totally empty "root" tree against another tree, and then it was obvious. > But what if the empty tree is a subdirectory of another tree - how do you > express that in a diff? Do you care? Right now, since we always recurse > into the tree (and then not find anything), empty trees will simply not > show up _at_all_ in any diffs. > > And what about usability issues elsewhere? With my patch, doing something > like a > > git add directory/ > > still won't do anything, because the behaviour of "git add" has always > been to recurse into directories. So to add a new empty directory, you'd > have to do > > git update-index --add directory > > and that's not exactly user-friendly. > > So do you add a "-n" flag to "git add" to tell it to not recurse? Or do > you always recurse, but then if you notice that the end result is empty, > you add it as a directory? Another issue I thought about was what you would do in the step 3 in the following: 1. David says "mkdir D; git add D"; you add S_IFDIR entry in the index at D; 2. David says "date >D/F; git add D/F"; presumably you drop D from the index (to keep the index more backward compatible) and add S_IFREG entry at D/F. 3. David says "git rm D/F". Have we stopped keeping track of the "empty directory" at this point? ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 5:28 ` Junio C Hamano @ 2007-07-19 5:38 ` Shawn O. Pearce 2007-07-19 6:08 ` David Kastrup ` (2 more replies) 2007-07-19 5:59 ` David Kastrup 1 sibling, 3 replies; 137+ messages in thread From: Shawn O. Pearce @ 2007-07-19 5:38 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, David Kastrup, Matthieu Moy, Johannes Schindelin, Git Mailing List Junio C Hamano <gitster@pobox.com> wrote: > Another issue I thought about was what you would do in the step > 3 in the following: > > 1. David says "mkdir D; git add D"; you add S_IFDIR entry in > the index at D; > > 2. David says "date >D/F; git add D/F"; presumably you drop D > from the index (to keep the index more backward compatible) > and add S_IFREG entry at D/F. > > 3. David says "git rm D/F". > > Have we stopped keeping track of the "empty directory" at this > point? Sadly yes. But I don't think that's what the folks who want to track empty directories want to have happen here. Which is why I'm thinking we just need to track the directory, as a node in the index, even if there are files in it, and even if we got that directory and its contained files there by just unpacking trees. -- Shawn. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 5:38 ` Shawn O. Pearce @ 2007-07-19 6:08 ` David Kastrup 2007-07-19 7:10 ` Geoff Russell 2007-07-19 6:09 ` Shawn O. Pearce [not found] ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com> 2 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 6:08 UTC (permalink / raw) To: git "Shawn O. Pearce" <spearce@spearce.org> writes: > Sadly yes. But I don't think that's what the folks who want to > track empty directories want to have happen here. > > Which is why I'm thinking we just need to track the directory, as a > node in the index, even if there are files in it, and even if we got > that directory and its contained files there by just unpacking > trees. I have come to about the same conclusion. So if backward-compatibility is any concern, one needs to work with some sort of extension records, and designing them in a way that new-git add tree old-git rm tree will not leave empty subdirectories in the index will be tricky, to say the least. One will likely have to add an extension record "directory" for each directory as well as "my containing dir takes care of itself" to each file that has been added with new-git and has had its parent directory entered by other means. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 6:08 ` David Kastrup @ 2007-07-19 7:10 ` Geoff Russell 0 siblings, 0 replies; 137+ messages in thread From: Geoff Russell @ 2007-07-19 7:10 UTC (permalink / raw) To: git Dear gits, When I first started using git, I naively did $ mkdir NEWDIR && chmod BLAH NEWDIR $ git add NEWDIR I just expected that this was content in the current directory that I wanted tracked together with the permissions. It wasn't ... I spent a day or 2 thinking I was stupid, my version of git was corrupt, my machine was busted, .... etc. Eventually of course, I read the documentation (when all else fails) and realised that this perfectly obvious behaviour was not supported. The behaviour was obviously so obvious that eventually an error message was added telling all the people who hadn't read the documentation that trying to add a directory was 'fatal'. I put up with and work around this behaviour because git is so bloody brilliant at everything else. But it would be nice if it worked. Cheers, Geoff Russell ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 5:38 ` Shawn O. Pearce 2007-07-19 6:08 ` David Kastrup @ 2007-07-19 6:09 ` Shawn O. Pearce 2007-07-19 8:13 ` Matthieu Moy [not found] ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com> 2 siblings, 1 reply; 137+ messages in thread From: Shawn O. Pearce @ 2007-07-19 6:09 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, David Kastrup, Matthieu Moy, Johannes Schindelin, Git Mailing List "Shawn O. Pearce" <spearce@spearce.org> wrote: > Junio C Hamano <gitster@pobox.com> wrote: > > Another issue I thought about was what you would do in the step > > 3 in the following: > > > > 1. David says "mkdir D; git add D"; you add S_IFDIR entry in > > the index at D; > > > > 2. David says "date >D/F; git add D/F"; presumably you drop D > > from the index (to keep the index more backward compatible) > > and add S_IFREG entry at D/F. > > > > 3. David says "git rm D/F". > > > > Have we stopped keeping track of the "empty directory" at this > > point? > > Sadly yes. But I don't think that's what the folks who want to > track empty directories want to have happen here. > > Which is why I'm thinking we just need to track the directory, as a > node in the index, even if there are files in it, and even if we got > that directory and its contained files there by just unpacking trees. I take this back. I really don't want that behavior. If I do: mkdir -p foo/bar echo hello >foo/bar/world git add foo git -f rm foo/bar/world I never asked for foo/bar or foo to stay. In fact I want them to disappear from Git entirely, as foo/bar is now empty and has no content. But we also cannot do a special --mkdir option for update-index either, because how do we know that the user designated subtree is a directory we must always keep in the index? So I think the only way this works is to have a new mode that we use in tree (04755 ?) that tells us not only is this thing a subtree, but also that the user wants it to stay here, even if it is empty. Those trees are always in the index as a real tree entry, even if there are files contained in it. And as far as getting that directory entry created/removed from the index, well, I think a special flag to update-index would be in order, much like --chmod=[+-]x. Just my $0.0002 USD, which really ain't worth much at all. -- Shawn. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 6:09 ` Shawn O. Pearce @ 2007-07-19 8:13 ` Matthieu Moy 2007-07-19 10:51 ` Tomash Brechko 0 siblings, 1 reply; 137+ messages in thread From: Matthieu Moy @ 2007-07-19 8:13 UTC (permalink / raw) To: Shawn O. Pearce Cc: Junio C Hamano, Linus Torvalds, David Kastrup, Johannes Schindelin, Git Mailing List "Shawn O. Pearce" <spearce@spearce.org> writes: > If I do: > > mkdir -p foo/bar > echo hello >foo/bar/world > git add foo > git -f rm foo/bar/world > > I never asked for foo/bar or foo to stay. Well, outside git, if you do $ mkdir -p foo/bar $ echo hello > foo/bar/world $ rm -f foo/bar/world You didn't ask foo/bar to stay either, and still, it's quite natural to have it stay in your filesystem. So, the same way you'd have ran "rm -r foo", it seems reasonable to me to ask for "git-rm -r foo" if the user wants to get rid of foo/ itself. -- Matthieu ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 8:13 ` Matthieu Moy @ 2007-07-19 10:51 ` Tomash Brechko 2007-07-19 11:31 ` David Kastrup 2007-07-19 12:16 ` Johannes Schindelin 0 siblings, 2 replies; 137+ messages in thread From: Tomash Brechko @ 2007-07-19 10:51 UTC (permalink / raw) To: Git Mailing List Dear Git fellows, A year or so ago I too would strongly advocate the need of tracking empty directories, permissions et al., it seemed so "natural" and "plain obvious" to me back then. But since that time I learned to appreciate the "contents tracking" approach, and now view directories (paths in general) only as the means for Git to know where to put the contents on checkout. This, BTW, is consistent with how Git figures container copies/renames. No doubt mighty Git developers can add support for empty directories, manage to stay backward compatible, think out consistent user interface etc. But there's no end to how much information one may want to store in Git to make it "_file system_ contents tracking software". Starting with empty directories, one may argue then that certain installation trees also need particular file ownership, so lets store user/group names like tar does. It was mentioned already in this thread that in addition to 'rwx' we also would have to store ACLs (some OSes have only one of these concepts, some both), SELinux security contexts, perhaps other arbitrary file attributes that may be part of file system state. Wouldn't it be better to preserve Git as a contents tracking system, and add some tools on top of it that can translate file system state into textual (or binary) form, so it can be stored in current Git? And then use this textual representation to restore actual file system attributes/layout on checkout? And the only change in Git itself would be some more hooks, for instance one hook before checking out over the old work tree, and one after the checkout. Or one can simply wrap certain Git commands to implement such hooks. In any case, no one is going to be against the new feature if it won't break anything for those of us who find the pure contents tracking the right thing. And storing empty directories by default may not be natural for everyone. So before going into technical details of how this can possibly be implemented, could someone answer the following: 1 Is Git going to track directories _always_? Looks like not, because in this thread there seems to be a distinction between 'git add DIR' and 'git add DIR/FILE', i.e. not everyone is sure if in the last case Git should track DIR or not. 2 If Git will track only explicitly mentioned directories, then what about recursive operations? Will it add only files by default, or directories too? Perhaps there will be some --add-dirs option to 'git add'. 3 Since in certain recursive operations one will want to affect directories too, how .gitignore will look? Most files have a notion of extension, so me may say '*.o', but with directories things a bit more complicated. One would want to say "exclude DIR2 only if under DIR1 at any hierarchy depth", i.e. exclude paths matching qr%DIR1/(.+/)?DIR1/%, and shell wildcards aren't that expressive, '*' doesn't cross hierarchy. Note that we live without this now, but this will be the next "natural" demand once directories become first class citizens. This list is surely incomplete. The point is that before we go into technical details, let's consider what exactly we are going to implement, how this will affect current usage model, how (empty) directory handling will extend to future similar demands, etc. My fear is that once some patch is around, it's very tempting to accept it. And once it is in, it's almost impossible to remove the feature later. Regards, -- Tomash Brechko ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 10:51 ` Tomash Brechko @ 2007-07-19 11:31 ` David Kastrup 2007-07-19 12:32 ` Tomash Brechko 2007-07-19 12:38 ` David Kastrup 2007-07-19 12:16 ` Johannes Schindelin 1 sibling, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 11:31 UTC (permalink / raw) To: git Tomash Brechko <tomash.brechko@gmail.com> writes: > Dear Git fellows, > > A year or so ago I too would strongly advocate the need of tracking > empty directories, permissions et al., it seemed so "natural" and > "plain obvious" to me back then. But since that time I learned to > appreciate the "contents tracking" approach, and now view > directories (paths in general) only as the means for Git to know > where to put the contents on checkout. This, BTW, is consistent > with how Git figures container copies/renames. I'll answer to this based on my proposal of adding "A/B/. [dir]" as a separate entity to index and repository, keeping "[tree]" out of indices, and don't allow an empty "[tree]" into repositories. This is a very natural abstraction. > But there's no end to how much information one may want to store in > Git to make it "_file system_ contents tracking software". Starting > with empty directories, one may argue then that certain installation > trees also need particular file ownership, so lets store user/group > names like tar does. It was mentioned already in this thread that > in addition to 'rwx' we also would have to store ACLs (some OSes > have only one of these concepts, some both), SELinux security > contexts, perhaps other arbitrary file attributes that may be part > of file system state. A [dir] entry may be eventually be made to track any of this, like a [file] entry could. If one wished to do this. > Wouldn't it be better to preserve Git as a contents tracking system, > and add some tools on top of it that can translate file system state > into textual (or binary) form, so it can be stored in current Git? > And then use this textual representation to restore actual file > system attributes/layout on checkout? And the only change in Git > itself would be some more hooks, for instance one hook before > checking out over the old work tree, and one after the checkout. Or > one can simply wrap certain Git commands to implement such hooks. This is not good since "tracking" means "tracking". With your model, the metainformation would be dissociated from the information. Renames and moves would make ground beef of the metadata. > In any case, no one is going to be against the new feature if it > won't break anything for those of us who find the pure contents > tracking the right thing. My proposal would allow setting an option to track or not track directories implicitly by default. > And storing empty directories by default may not be natural for > everyone. So before going into technical details of how this can > possibly be implemented, could someone answer the following: I'll answer assuming the proposed model. > 1 Is Git going to track directories _always_? Looks like not, because > in this thread there seems to be a distinction between 'git add DIR' > and 'git add DIR/FILE', i.e. not everyone is sure if in the last > case Git should track DIR or not. Let's have a variable core.adddirs If you set core.adddirs to false, git will not enter directories into the index for addition. Consequently, they will not end up in the repository. If you git-rm a directory, the index will contain a notice to delete the directory along with deletion notices for all registered other elements of the directory. Committing this means that the directory will no longer be separately controlled by git, even if for some reason the repository has other files remaining in the tree. Something like the Linux kernel repository which may be accessed by ancient git versions would naturally contain "core.adddirs: false" in its default configuration file, and this would be passed around when cloning. So directory elements would stay out of it. > 2 If Git will track only explicitly mentioned directories, then what > about recursive operations? Will it add only files by default, or > directories too? Perhaps there will be some --add-dirs option to > 'git add'. There could be a commandline override for "core.adddirs". > 3 Since in certain recursive operations one will want to affect > directories too, how .gitignore will look? Most files have a notion > of extension, so me may say '*.o', but with directories things a bit > more complicated. One would want to say "exclude DIR2 only if under > DIR1 at any hierarchy depth", i.e. exclude paths matching > qr%DIR1/(.+/)?DIR1/%, and shell wildcards aren't that expressive, > '*' doesn't cross hierarchy. Note that we live without this now, > but this will be the next "natural" demand once directories become > first class citizens. Huh? I don't get this. It's like "we can't allow people to buy chocolate, or they'll demand next to have nuclear weapons delivered at their house". Deal with the demands as they come up. If a directory has a tree-local name ".", it can be dealt with in patterns if really needed. I don't see much of a necessity however. Although it would be natural to have core.adddirs: false be equivalent to core.excludefile: . And so it might be possible to actually not need a separate core.adddirs option at all, technically. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 11:31 ` David Kastrup @ 2007-07-19 12:32 ` Tomash Brechko 2007-07-19 12:46 ` David Kastrup 2007-07-19 12:38 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: Tomash Brechko @ 2007-07-19 12:32 UTC (permalink / raw) To: git Hi David, On Thu, Jul 19, 2007 at 13:31:50 +0200, David Kastrup wrote: > core.excludefile: . Really nice idea to give directories 'DIR/.' name. I'm sure there are several other ways to implement your proposal. But why to put in in Git itself? Decomposition and abstraction principle tells me that this should go to some other place. Please consider this: I myself use Git to track my own local projects, and for this usage you proposal have no value for me, i.e. as a _Source_ Code Management system Git is rather complete. But I also track /etc and ~/ in Git, and for this I'd love to have directories, permissions, ownership, other attributes, to be tracked. I have Perl script wrapping Git that allows me to filter tracked paths by full regexps instead of Git's file globs, and also to filter out too big files assuming that they are binary anyway. Most people solving the same problem moved further and implemented tools to store part of file system state (permissions and ownership) in a textual representation, to track that in Git. I'm sure you've seen such posts in the list. And my point is that rather than building the support for all of it into core Git, and then implementing sophisticated configuration to disable parts of it, wouldn't it be better to have a separate tools orthogonal to Git itself? At the extreme case (probably not really seriously), consider the following design: there are two layers, file system layer, and contents layer. On checkout file system layer creates (or examines existing) directory tree along with all files and their file system state (permissions, ownership, ACLs, attributes, ...), and then asks contents layer to update the contents. This way layers are independent, and file system layer may be implemented on top of pure contents tracking. File system layer may be extended to be made particular OS/FS dependent if some development team wishes so. Even hard links may be supported: since file system layer may deside to remember that two paths really reference the same inode (i.e. contents), contents layer may be asked to update the data only once with either file name/descriptor. This, BTW, is why I think not tracking file attributes when versioning, say, /etc, is not a big loss. When I will move to the new system, I will mostly be interested in contents diffs of the same configuration files in /etc. I will trust their new attributes, and will not want to restore them to what they were on the old system. So the essence of my objection is that we should not pollute core Git with file system state tracking more than it's required to know where to put the contents to. Everything else should go elsewhere. Again, I'd love to have your proposal be implemented, but only in a way that won't interfere with pure SCM's operations. -- Tomash Brechko ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 12:32 ` Tomash Brechko @ 2007-07-19 12:46 ` David Kastrup 2007-07-23 20:18 ` Nix 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 12:46 UTC (permalink / raw) To: git Tomash Brechko <tomash.brechko@gmail.com> writes: > Hi David, > > On Thu, Jul 19, 2007 at 13:31:50 +0200, David Kastrup wrote: >> core.excludefile: . > > Really nice idea to give directories 'DIR/.' name. I'm sure there are > several other ways to implement your proposal. But why to put in in > Git itself? Decomposition and abstraction principle tells me that > this should go to some other place. Because of a fundamental law of computation: information maintained in two separate places will get out of synch eventually. > Please consider this: I myself use Git to track my own local > projects, and for this usage you proposal have no value for me, > i.e. as a _Source_ Code Management system Git is rather complete. > But I also track /etc and ~/ in Git, and for this I'd love to have > directories, permissions, ownership, other attributes, to be > tracked. I have Perl script wrapping Git that allows me to filter > tracked paths by full regexps instead of Git's file globs, and also > to filter out too big files assuming that they are binary anyway. Look, git _tracks_ contents. Your permissions managements needs to be told explicitly when and how things change. So you end up with git _tracking_ material and your permissions/directory management needing the level of manual handholding Subversion demands. > And my point is that rather than building the support for all of it > into core Git, and then implementing sophisticated configuration to > disable parts of it, wouldn't it be better to have a separate tools > orthogonal to Git itself? And my personal answer to that is "no". We don't want orthogonality for intimately related things, because it forces us to work the "orthogonal" things in lockstep. And if you force git to operate in lockstep with manual explicit tracking, then git becomes useless for tracking stuff automatically. > So the essence of my objection is that we should not pollute core > Git with file system state tracking more than it's required to know > where to put the contents to. Everything else should go elsewhere. > > Again, I'd love to have your proposal be implemented, but only in a > way that won't interfere with pure SCM's operations. Tell git to ignore "." and it won't "interfere". -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 12:46 ` David Kastrup @ 2007-07-23 20:18 ` Nix 2007-07-23 20:49 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Nix @ 2007-07-23 20:18 UTC (permalink / raw) To: David Kastrup; +Cc: git On 19 Jul 2007, David Kastrup stated: > Tomash Brechko <tomash.brechko@gmail.com> writes: >> Please consider this: I myself use Git to track my own local >> projects, and for this usage you proposal have no value for me, >> i.e. as a _Source_ Code Management system Git is rather complete. >> But I also track /etc and ~/ in Git, and for this I'd love to have >> directories, permissions, ownership, other attributes, to be >> tracked. I have Perl script wrapping Git that allows me to filter >> tracked paths by full regexps instead of Git's file globs, and also >> to filter out too big files assuming that they are binary anyway. > > Look, git _tracks_ contents. Your permissions managements needs to be > told explicitly when and how things change. So you end up with git > _tracking_ material and your permissions/directory management needing > the level of manual handholding Subversion demands. Actually, if we had a post-checkout hook, we could use a pre-commit hook to keep track of directory existence, permissions, et seq, and a post- checkout hook to restore them. (But we don't, at least not yet. Adding one is probably quite easy.) ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 20:18 ` Nix @ 2007-07-23 20:49 ` David Kastrup 2007-07-23 21:49 ` Nix 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-23 20:49 UTC (permalink / raw) To: Nix; +Cc: git Nix <nix@esperi.org.uk> writes: > On 19 Jul 2007, David Kastrup stated: >> Tomash Brechko <tomash.brechko@gmail.com> writes: >>> Please consider this: I myself use Git to track my own local >>> projects, and for this usage you proposal have no value for me, >>> i.e. as a _Source_ Code Management system Git is rather complete. >>> But I also track /etc and ~/ in Git, and for this I'd love to have >>> directories, permissions, ownership, other attributes, to be >>> tracked. I have Perl script wrapping Git that allows me to filter >>> tracked paths by full regexps instead of Git's file globs, and also >>> to filter out too big files assuming that they are binary anyway. >> >> Look, git _tracks_ contents. Your permissions managements needs to >> be told explicitly when and how things change. So you end up with >> git _tracking_ material and your permissions/directory management >> needing the level of manual handholding Subversion demands. > > Actually, if we had a post-checkout hook, we could use a pre-commit > hook to keep track of directory existence, permissions, et seq, and > a post- checkout hook to restore them. Actually, tracking permissions would be cheap: one just needs to replace the permission-munging macros in git with identity. Ownership -- well, that's harder. But my sentiment remains: git _tracks_ stuff: it notices when things move around and follows them. Statically snapshotting permissions creates a layer that is quite less flexible. The information gets detached. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 20:49 ` David Kastrup @ 2007-07-23 21:49 ` Nix 2007-07-23 22:05 ` Nix 2007-07-23 22:16 ` David Kastrup 0 siblings, 2 replies; 137+ messages in thread From: Nix @ 2007-07-23 21:49 UTC (permalink / raw) To: David Kastrup; +Cc: git On 23 Jul 2007, David Kastrup uttered the following: > Nix <nix@esperi.org.uk> writes: >> Actually, if we had a post-checkout hook, we could use a pre-commit >> hook to keep track of directory existence, permissions, et seq, and >> a post- checkout hook to restore them. > > Actually, tracking permissions would be cheap: one just needs to > replace the permission-munging macros in git with identity. Ownership > -- well, that's harder. > > But my sentiment remains: git _tracks_ stuff: it notices when things > move around and follows them. Statically snapshotting permissions > creates a layer that is quite less flexible. The information gets > detached. Not if you record it in a file which is checked in in the same commit that is tracked, it isn't (that's what the pre-commit hook is for). It's true that git won't natively have any knowledge of that data, but Linus has fairly effectively shown that it shouldn't have any such knowledge and doesn't need it. (You might want to give git-diff knowledge of it, just so it can skip it unless a new flag is given. Give the file a nice format, and bingo, readable permission/ownership diffs!) (I'd recommend storing the names of user/group file owners as well as the uids, so you can --- given suitable permissions --- chown to the right username in preference to uid if that user exists at checkout time.) Doing this *efficiently* is another matter: probably a pair of hooks are needed, run on pre-checkout and post-checkout: they can communicate so as only to fiddle permissions on things which are newly appeared or whose permissions have changed. Obviously because the permissions, ownerships et al aren't recorded in the index this will slow committing down, but given that git-update-index will already have sucked the entire tree's inodes into the page cache anyway, I don't think a second pass over the working tree snarfing permissions would slow it down much. As I need this anyway (I'm backing up a filesystem via git, yes, I'm insane but I need version control and it's horrifically redundant so packing it will save heaps of space), I guess I'd better get off my rear and write the code. (The recent commit-as-a-builtin's introduction of a run_hook() function will be pretty damn useful: good timing, I guess.) ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 21:49 ` Nix @ 2007-07-23 22:05 ` Nix 2007-07-23 22:52 ` Jakub Narebski 2007-07-23 22:16 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: Nix @ 2007-07-23 22:05 UTC (permalink / raw) To: David Kastrup; +Cc: git On 23 Jul 2007, nix@esperi.org.uk outgrape: > (I'd recommend storing the names of user/group file owners as well as > the uids, so you can --- given suitable permissions --- chown to the > right username in preference to uid if that user exists at checkout > time.) Suddenly this gets more complex. git-merge-file(1) has to understand the contents of this file, so as not to consider merges conflicting unless two files actually have different permissions (i.e. doing a line by line diff, and combining the two such that at most one file with a given name exists in the result), and so as not to consider lines with differing ownerships conflicting unless we're running under a uid in which we can change ownerships at all. (I'd like to track ownership but it's looking like a bit of a nest of snakes.) And the problem is that while git has a lot of strategies for merging *trees*, its file merge system is totally unpluggable: it just falls back to xdiff's merging system. I guess I'll have to add that feature :) (How does this cope with binary files, I wonder? I seem to recall something about that flying past back before the volume of the git list overwhelmed me...) ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 22:05 ` Nix @ 2007-07-23 22:52 ` Jakub Narebski 2007-07-25 22:43 ` Nix 0 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2007-07-23 22:52 UTC (permalink / raw) To: git Nix wrote: > And the problem is that while git has a lot of strategies for merging > *trees*, its file merge system is totally unpluggable: it just falls > back to xdiff's merging system. I guess I'll have to add that feature :) Not true. You can add custom diff driver for files using gitattributes system. > (How does this cope with binary files, I wonder? I seem to recall > something about that flying past back before the volume of the git list > overwhelmed me...) xdiff has binary diff, and git has some kind of "ascii-armored" binary diff output. As to how to merge binary files: I suspect that they always conflict, unless the merge is trivial. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 22:52 ` Jakub Narebski @ 2007-07-25 22:43 ` Nix 0 siblings, 0 replies; 137+ messages in thread From: Nix @ 2007-07-25 22:43 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On 23 Jul 2007, Jakub Narebski spake thusly: > Nix wrote: > >> And the problem is that while git has a lot of strategies for merging >> *trees*, its file merge system is totally unpluggable: it just falls >> back to xdiff's merging system. I guess I'll have to add that feature :) > > Not true. You can add custom diff driver for files using gitattributes > system. Oo. Excellent, I didn't notice that. Thank you. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 21:49 ` Nix 2007-07-23 22:05 ` Nix @ 2007-07-23 22:16 ` David Kastrup 2007-07-23 22:31 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-23 22:16 UTC (permalink / raw) To: Nix; +Cc: git Nix <nix@esperi.org.uk> writes: > On 23 Jul 2007, David Kastrup uttered the following: >> Nix <nix@esperi.org.uk> writes: >>> Actually, if we had a post-checkout hook, we could use a pre-commit >>> hook to keep track of directory existence, permissions, et seq, and >>> a post- checkout hook to restore them. >> >> Actually, tracking permissions would be cheap: one just needs to >> replace the permission-munging macros in git with identity. Ownership >> -- well, that's harder. >> >> But my sentiment remains: git _tracks_ stuff: it notices when things >> move around and follows them. Statically snapshotting permissions >> creates a layer that is quite less flexible. The information gets >> detached. > > Not if you record it in a file which is checked in in the same > commit that is tracked, it isn't (that's what the pre-commit hook is > for). I have my doubts that anybody but git actually has a clue what to snapshot when, and where to place it: don't forget that index manipulation and committing are done at different times, and you need not even commit all of the index. > It's true that git won't natively have any knowledge of that data, > but Linus has fairly effectively shown that it shouldn't have any > such knowledge and doesn't need it. Last time I looked, git tracked the executable bit. For kernel development, this is pretty much what it takes, and with colloborative work, tracking anything but the owner permissions is going to lead to annoying and verbose merge behavior quite a lot. And of the owner permissions, r and w complicate proper handling when unset. But being able to specify other masks for applications other than multi-site colloborative development would likely not hurt. > Doing this *efficiently* is another matter: probably a pair of hooks > are needed, run on pre-checkout and post-checkout: they can > communicate so as only to fiddle permissions on things which are > newly appeared or whose permissions have changed. > > Obviously because the permissions, ownerships et al aren't recorded > in the index this will slow committing down, It will also detach the time where the file contents and the permissions get recorded. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 22:16 ` David Kastrup @ 2007-07-23 22:31 ` Linus Torvalds 2007-07-23 23:32 ` Nix [not found] ` <86ps2ithyl.fsf@lola.quinscape.zz> 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-23 22:31 UTC (permalink / raw) To: David Kastrup; +Cc: Nix, git On Tue, 24 Jul 2007, David Kastrup wrote: > Nix <nix@esperi.org.uk> writes: > > > It's true that git won't natively have any knowledge of that data, > > but Linus has fairly effectively shown that it shouldn't have any > > such knowledge and doesn't need it. > > Last time I looked, git tracked the executable bit. Actually, originally it tracked the whole mode word. It was a total disaster. People who had different umasks etc got mode clashes all the time, and you ended up having silly and unnecessary conflicts. The same would be true (to an even higher degree) if we tracked owner and group information etc. So practically speaking, you want to track the *minimal* possible state, not the maximal one. This is one of those "in theory" vs "in practice" things. In *theory*, it would be nice for an SCM to track everything that is known about a file. In *practice*, that sucks. So this does mean that if you want to explicitly track certain things (ownership and more complete file permissions, or ACL's, or "resource forks", or any number of other things that a file *could* have on various systems), you end up havign to track them in something else than git, or you end up having to track them as a separate "metadata file". One such metadata file is, for example, the ".gitattributes" file. It *could* be used to contain things like path-based rules for ownership, not just things like whether to check out with CRLF etc. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 22:31 ` Linus Torvalds @ 2007-07-23 23:32 ` Nix 2007-07-23 23:57 ` Linus Torvalds [not found] ` <86ps2ithyl.fsf@lola.quinscape.zz> 1 sibling, 1 reply; 137+ messages in thread From: Nix @ 2007-07-23 23:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Kastrup, git On 23 Jul 2007, Linus Torvalds spake thusly: > So practically speaking, you want to track the *minimal* possible state, > not the maximal one. I think it depends on your use case. For source code and indeed anything with heavy merges, this is true: but I'm increasingly using git as a sort of `merged historical tar' to store images of entire random filesystem trees across time, and gaining the benefit of the packer's lovely space-efficiency as well (doing this with svn would be a lost cause, twice the space usage before you even think about the repository). And in that case, preserving everything you can makes sense. (Perhaps what I should be doing is tarring the directory tree up and storing the *tarball* in git. I'll try that and see what it does to pack sizes. These are version-controlled backups of my mother's magnum opus in progress so you can understand that I don't want to destroy them accidentally: I'd never hear the end of it! ;) ) > So this does mean that if you want to explicitly track certain things > (ownership and more complete file permissions, or ACL's, or "resource > forks", or any number of other things that a file *could* have on various > systems), you end up havign to track them in something else than git, or > you end up having to track them as a separate "metadata file". Yes indeed: that's why I proposed doing this using a couple of new hooks driving entirely optional permissions-preservation stuff. Most use cases really won't want to track this, so this sort of stuff shouldn't impose upon the git core or upon anyone who doesn't want it. (However, the ability to have alternative file merging strategies *may* be useful elsewhere, perhaps.) ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 23:32 ` Nix @ 2007-07-23 23:57 ` Linus Torvalds 0 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-23 23:57 UTC (permalink / raw) To: Nix; +Cc: David Kastrup, git On Tue, 24 Jul 2007, Nix wrote: > > On 23 Jul 2007, Linus Torvalds spake thusly: > > So practically speaking, you want to track the *minimal* possible state, > > not the maximal one. > > I think it depends on your use case. For source code and indeed anything > with heavy merges, this is true Yes, very obviously. Git is targeted towards source code and working in a distributed manner across a very wide variety of users and setups, while something that would be more targeted towards a special scenario and much stricter usage would find that the "minimum" set is much bigger, and might well include ACL's and usr information. > but I'm increasingly using git as a sort of `merged historical tar' to > store images of entire random filesystem trees across time, and gaining > the benefit of the packer's lovely space-efficiency as well (doing this > with svn would be a lost cause, twice the space usage before you even > think about the repository). And in that case, preserving everything you > can makes sense. On the other hand, almost all the space-efficiency comes from things that delta well, and change quickly. That includes the file data itself (and very much the tree contents), but it doesn't necessarily include things like permissions and user information - mainly because that doesn't actually delta at all (not because it can't, but because it hardly ever changes, and when it does change, it often changes all over the map). To make an example of your "tar" situation: if you want to be space- efficient in a tar-like setting, you should *not* make user information be something that is per-file at all! Why? Because in 99% of all tar-files, there is a single user name. So even your usage *may* actually be much better off using git as a "data backend", and using something totally different for "user/group" information. Yes, you'd have to make a "shim layer" on top of git to hide the fact that the user information is handled separately, but that shouldn't be that hard per se. > (Perhaps what I should be doing is tarring the directory tree up and > storing the *tarball* in git. I'll try that and see what it does to pack > sizes. These are version-controlled backups of my mother's magnum opus > in progress so you can understand that I don't want to destroy them > accidentally: I'd never hear the end of it! ;) ) You don't want to do this. There's a few reasons, but the two big ones are: - the git delta logic is strictly a "single delta base" thing. Yes, git would be able to find the delta's between two tar-files (as long as you don't compress them), and express one tar-file in terms of the other, and it would probably save a fair amount of disk. But it would not be able to do _nearly_ as well as it can if you store individual files, and let git just find the best delta per-file (and not just "one delta base for the whole tar-ball") - git is very much optimized for "many small files". Yes, you can check in large files, and it works fine, but quite frankly, all the design and heavy optimizations have been about having trees with tens of thousands of files, but the files individually reasonably small. A lot of the speed advantages of git come from efficiently pruning away whole sub-directory structures, for example, and not even touching the data at all! So if you track just one file that changes in every version, all the things that make git fly are basically disabled, and you won't take full advantage of what git does. > Yes indeed: that's why I proposed doing this using a couple of new hooks > driving entirely optional permissions-preservation stuff. Most use cases > really won't want to track this, so this sort of stuff shouldn't impose > upon the git core or upon anyone who doesn't want it. (However, the > ability to have alternative file merging strategies *may* be useful > elsewhere, perhaps.) The ".gitattributes" file really could be used for some of that. Using it to track ownership and full permissions would not be impossible, and it could have interesting semantics (especially as .gitattibutes is path pattern based - so you could literally do a "user" attribute, and say that everything in a particular subdirectory is owned by a particular user). That wouldn't be UNIX-like semantics, of course, but it can be very useful for certain things. Taking an example of something totally independent of git, look at how "udev" handles permissions, for example. In situations like that, static user information is useless, and it actually ends up setting up modes and ownership based on name-based patterns rather than having each file have a permission/user (because individual files appear and disappear, the name-based patterns are the things that matter). So if you *just* want to track a regular filesystem layout, that's not the right thing, but "udev" does show an example of a totally different way of describing ownership and permissions, and one which wouldn't actually be at all foreign to git. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <86ps2ithyl.fsf@lola.quinscape.zz>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <86ps2ithyl.fsf@lola.quinscape.zz> @ 2007-07-24 6:56 ` Nix 0 siblings, 0 replies; 137+ messages in thread From: Nix @ 2007-07-24 6:56 UTC (permalink / raw) To: David Kastrup; +Cc: Linus Torvalds, git On 24 Jul 2007, David Kastrup spake thusly: > But merging will become nicer if the permissions actually stay > associated with the file rather than the file name. Even in things > like /etc backups, blobs not infrequently relocate from one place to > another when the system gets updated. Even without that we'd need to merge without context, i.e. with totally independent lines, for such a file. So it's not the standard git file merge. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 11:31 ` David Kastrup 2007-07-19 12:32 ` Tomash Brechko @ 2007-07-19 12:38 ` David Kastrup 2007-07-19 13:21 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 12:38 UTC (permalink / raw) To: git David Kastrup <dak@gnu.org> writes: > Although it would be natural to have > core.adddirs: false > be equivalent to > core.excludefile: . > > And so it might be possible to actually not need a separate > core.adddirs option at all, technically. To followup on myself here: A project such as the linux kernel which presumably does not want to have directories tracked will put the single pattern . into its top-level .gitignore file. That is all. At least if it does not confuse current versions of git to do ugly things. A separate option core.adddirs is still necessary because man gitignore states: When deciding whether to ignore a path, git normally checks gitignore patterns from multiple sources, with the following order of precedence: · Patterns read from the file specified by the configuration variable core.excludesfile. · Patterns read from $GIT_DIR/info/exclude. · Patterns read from a .gitignore file in the same directory as the path, or in any parent directory, ordered from the deepest such file to a file in the root of the repository. These patterns match rela‐ tive to the location of the .gitignore file. A project normally includes such .gitignore files in its repository, containing pat‐ terns for files generated as part of the project build. The priority for "core.adddirs", however, should be below that so that preferences set in the repository's .gitignore files take precedence. So core.excludesfile seems to be the wrong place. A project with the policy of always tracking directories would place !. into its top-level .gitignore file. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 12:38 ` David Kastrup @ 2007-07-19 13:21 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 13:21 UTC (permalink / raw) To: git David Kastrup <dak@gnu.org> writes: > David Kastrup <dak@gnu.org> writes: > >> Although it would be natural to have >> core.adddirs: false >> be equivalent to >> core.excludefile: . >> >> And so it might be possible to actually not need a separate >> core.adddirs option at all, technically. > > To followup on myself here: > > A project such as the linux kernel which presumably does not want to > have directories tracked will put the single pattern > . > into its top-level .gitignore file. That is all. At least if it does > not confuse current versions of git to do ugly things. Another followup: it doesn't. I placed a single line . into a .gitignore file. This did not cause git to ignore the contents of ., and even git-add . worked as previously, namely adding the contents of the current directory and subdirectories to the index. In short: the gitignore idea for policing directory management is perfectly upwards-compatible with current versions of git. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 10:51 ` Tomash Brechko 2007-07-19 11:31 ` David Kastrup @ 2007-07-19 12:16 ` Johannes Schindelin 2007-07-19 12:24 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: Johannes Schindelin @ 2007-07-19 12:16 UTC (permalink / raw) To: Tomash Brechko; +Cc: Git Mailing List Hi, On Thu, 19 Jul 2007, Tomash Brechko wrote: > A year or so ago I too would strongly advocate the need of tracking > empty directories, permissions et al., it seemed so "natural" and "plain > obvious" to me back then. But since that time I learned to appreciate > the "contents tracking" approach, and now view directories (paths in > general) only as the means for Git to know where to put the contents on > checkout. This, BTW, is consistent with how Git figures container > copies/renames. Thank you. It is my impression, too, that after a while it becomes obvious what is good and what is not. FWIW I just whipped up a proof-of-concept patch (so at least _I_ cannot be accused of chickening out of writing code): This adds the command line option "--add-empty-dirs" to "git add", which does the only sane thing: putting a placeholder into that directory, and adding that. Since ".gitignore" is already a reserved file name in git, it is used as the name of this place holder. --- It is probably not fool-proof yet, needs documentation and a test case. But I am really sick and tired of this discussion. builtin-add.c | 25 +++++++++++++++++++++---- dir.c | 16 +++++++++++++++- dir.h | 3 ++- 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/builtin-add.c b/builtin-add.c index 7345479..1294840 100644 --- a/builtin-add.c +++ b/builtin-add.c @@ -47,7 +47,7 @@ static void prune_directory(struct dir_struct *dir, const char **pathspec, int p } static void fill_directory(struct dir_struct *dir, const char **pathspec, - int ignored_too) + int ignored_too, int substitute_empty_dirs) { const char *path, *base; int baselen; @@ -63,6 +63,7 @@ static void fill_directory(struct dir_struct *dir, const char **pathspec, if (!access(excludes_file, R_OK)) add_excludes_from_file(dir, excludes_file); } + dir->substitute_empty_directories = substitute_empty_dirs; /* * Calculate common prefix for the pathspec, and @@ -143,7 +144,8 @@ static const char ignore_warning[] = int cmd_add(int argc, const char **argv, const char *prefix) { int i, newfd; - int verbose = 0, show_only = 0, ignored_too = 0; + int verbose = 0, show_only = 0, ignored_too = 0, + substitute_empty_dirs = 0; const char **pathspec; struct dir_struct dir; int add_interactive = 0; @@ -191,6 +193,10 @@ int cmd_add(int argc, const char **argv, const char *prefix) take_worktree_changes = 1; continue; } + if (!strcmp(arg, "--add-empty-dirs")) { + substitute_empty_dirs = 1; + continue; + } usage(builtin_add_usage); } @@ -206,7 +212,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) } pathspec = get_pathspec(prefix, argv + i); - fill_directory(&dir, pathspec, ignored_too); + fill_directory(&dir, pathspec, ignored_too, substitute_empty_dirs); if (show_only) { const char *sep = "", *eof = ""; @@ -231,8 +237,19 @@ int cmd_add(int argc, const char **argv, const char *prefix) exit(1); } - for (i = 0; i < dir.nr; i++) + for (i = 0; i < dir.nr; i++) { + const char *name = dir.entries[i]->name; + const char *slash; + if (substitute_empty_dirs && (slash = strrchr(name, '/')) && + !strcmp(slash, "/.gitignore") && + access(name, R_OK)) { + int fd = open(name, O_WRONLY | O_CREAT | O_EXCL, 0666); + if (fd < 0) + return error("Could not create %s", name); + close(fd); + } add_file_to_cache(dir.entries[i]->name, verbose); + } finish: if (active_cache_changed) { diff --git a/dir.c b/dir.c index 8d8faf5..b0b4628 100644 --- a/dir.c +++ b/dir.c @@ -456,11 +456,11 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co { DIR *fdir = opendir(path); int contents = 0; + char fullname[PATH_MAX + 1]; if (fdir) { int exclude_stk; struct dirent *de; - char fullname[PATH_MAX + 1]; memcpy(fullname, base, baselen); exclude_stk = push_exclude_per_directory(dir, base, baselen); @@ -536,6 +536,20 @@ exit_early: pop_exclude_per_directory(dir, exclude_stk); } + if (!contents && dir->substitute_empty_directories) { + const char *name = ".gitignore"; + int len = strlen(name); + /* Ignore overly long pathnames! */ + if (len + baselen + 8 > sizeof(fullname)) + return 0; + memcpy(fullname + baselen, name, len+1); + if (simplify_away(fullname, baselen + len, simplify) + || excluded(dir, fullname)) + return 0; + dir_add_name(dir, fullname, baselen + len); + return 1; + } + return contents; } diff --git a/dir.h b/dir.h index ec0e8ab..0099718 100644 --- a/dir.h +++ b/dir.h @@ -34,7 +34,8 @@ struct dir_struct { show_other_directories:1, hide_empty_directories:1, no_gitlinks:1, - collect_ignored:1; + collect_ignored:1, + substitute_empty_directories:1; struct dir_entry **entries; struct dir_entry **ignored; ^ permalink raw reply related [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 12:16 ` Johannes Schindelin @ 2007-07-19 12:24 ` David Kastrup 2007-07-19 14:44 ` Brian Gernhardt 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 12:24 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Thu, 19 Jul 2007, Tomash Brechko wrote: > >> A year or so ago I too would strongly advocate the need of tracking >> empty directories, permissions et al., it seemed so "natural" and "plain >> obvious" to me back then. But since that time I learned to appreciate >> the "contents tracking" approach, and now view directories (paths in >> general) only as the means for Git to know where to put the contents on >> checkout. This, BTW, is consistent with how Git figures container >> copies/renames. > > Thank you. It is my impression, too, that after a while it becomes > obvious what is good and what is not. > > FWIW I just whipped up a proof-of-concept patch (so at least _I_ cannot be > accused of chickening out of writing code): > > This adds the command line option "--add-empty-dirs" to "git add", which > does the only sane thing: putting a placeholder into that directory, and > adding that. Since ".gitignore" is already a reserved file name in git, > it is used as the name of this place holder. But that means that checkout will create a file .gitignore in previously empty directories, doesn't it? I think that the placeholder name should rather be ".". -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 12:24 ` David Kastrup @ 2007-07-19 14:44 ` Brian Gernhardt 2007-07-19 15:43 ` Johannes Schindelin 0 siblings, 1 reply; 137+ messages in thread From: Brian Gernhardt @ 2007-07-19 14:44 UTC (permalink / raw) To: David Kastrup; +Cc: git On Jul 19, 2007, at 8:24 AM, David Kastrup wrote: > I think that the placeholder name should rather be ".". For what it's worth, the more this gets discussed, the more I think your idea is a good one. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 14:44 ` Brian Gernhardt @ 2007-07-19 15:43 ` Johannes Schindelin 2007-07-19 16:06 ` Brian Gernhardt ` (2 more replies) 0 siblings, 3 replies; 137+ messages in thread From: Johannes Schindelin @ 2007-07-19 15:43 UTC (permalink / raw) To: Brian Gernhardt; +Cc: David Kastrup, git Hi, On Thu, 19 Jul 2007, Brian Gernhardt wrote: > > On Jul 19, 2007, at 8:24 AM, David Kastrup wrote: > > > I think that the placeholder name should rather be ".". > > For what it's worth, the more this gets discussed, the more I think your > idea is a good one. I do not like it at all. "." already has a very special meaning. It is a _directory_, no place holder. More and more I get the impression that this thread is just not worth it. The problem was solved long ago, and all that is talked about here is how to complicate things. Unhappy, Dscho ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:43 ` Johannes Schindelin @ 2007-07-19 16:06 ` Brian Gernhardt 2007-07-19 16:17 ` Johannes Schindelin 2007-07-19 16:17 ` Matthieu Moy 2007-07-19 16:21 ` David Kastrup 2 siblings, 1 reply; 137+ messages in thread From: Brian Gernhardt @ 2007-07-19 16:06 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git On Jul 19, 2007, at 11:43 AM, Johannes Schindelin wrote: > I do not like it at all. "." already has a very special meaning. > It is a > _directory_, no place holder. And we're talking about using it to describe the directory. > More and more I get the impression that this thread is just not > worth it. > The problem was solved long ago, and all that is talked about here > is how > to complicate things. By solved, you mean ignored? There is no reason for git not to track empty directories other than "we don't like it". Some projects I work on require certain directories to exist in order to run properly, but tend to occasionally do things like delete all files in this required directory. So far, it hasn't been an issue because I'm working solo and using git just to bar against stupidity. Git's policy of "don't touch things I don't know about" works. But if I ever had to have someone clone it, they'd need to re- create the directories. In this case, empty directories are part of the content I care about. Yes, I could have a script do it, but that's a work around, not a solution. In another case, I'm using creating a git repository out of source that is distributed as occasional tarballs with patches in between. Git's lack of ability to track the empty directories means that I can NOT re-create appropriate tarballs for the states distributed only as patches. Yes, I could add placeholder files, but then the state is not identical. There are use cases for tracking directories. I'll agree that it shouldn't be used for every source tree. But there are cases where it is useful and there's no reason to simply forbid it. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 16:06 ` Brian Gernhardt @ 2007-07-19 16:17 ` Johannes Schindelin 2007-07-19 16:28 ` David Kastrup 2007-07-19 16:34 ` Brian Gernhardt 0 siblings, 2 replies; 137+ messages in thread From: Johannes Schindelin @ 2007-07-19 16:17 UTC (permalink / raw) To: Brian Gernhardt; +Cc: git Hi, On Thu, 19 Jul 2007, Brian Gernhardt wrote: > On Jul 19, 2007, at 11:43 AM, Johannes Schindelin wrote: > > > I do not like it at all. "." already has a very special meaning. It > > is a _directory_, no place holder. > > And we're talking about using it to describe the directory. > > > More and more I get the impression that this thread is just not worth > > it. The problem was solved long ago, and all that is talked about here > > is how to complicate things. > > By solved, you mean ignored? There is no reason for git not to track > empty directories other than "we don't like it". No, no, no, no, no! You are really trying to annoy me, right? Here a short description, which you should read until you understand it and then leave me alone: To add a directory to the tracked content, you have to _mark_ it as tracked. So that when you remove the _real_ content of the directory, Git will not remove it. Alas, we already have such a marker. It is called ".gitignore", and has been ignored by _you_. There is _nothing_ wrong, from a technical standpoint, to call this marker ".gitignore", and it is _also_ not wrong to put this marker into the file system _in addition_ to the index. So go and add your directories via that marker, and _be done with it_. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 16:17 ` Johannes Schindelin @ 2007-07-19 16:28 ` David Kastrup 2007-07-19 16:34 ` Brian Gernhardt 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 16:28 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Here a short description, which you should read until you understand > it and then leave me alone: > > To add a directory to the tracked content, you have to _mark_ it as > tracked. So that when you remove the _real_ content of the > directory, Git will not remove it. Correct. That is what my proposal is about. > Alas, we already have such a marker. It is called ".gitignore", and > has been ignored by _you_. There is _nothing_ wrong, from a > technical standpoint, to call this marker ".gitignore", and it is > _also_ not wrong to put this marker into the file system _in > addition_ to the index. Uh, then the directories are no longer empty. > So go and add your directories via that marker, and _be done with > it_. But one is not done before running find -name .gitignore -delete and then the next recursive add will remove the .gitignore "markers". The idea of "." is to have a marker that does _not_ appear in the work directory. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 16:17 ` Johannes Schindelin 2007-07-19 16:28 ` David Kastrup @ 2007-07-19 16:34 ` Brian Gernhardt 2007-07-19 17:30 ` Johannes Schindelin [not found] ` <Pine.LNX.4.64.070719 1829530.14781@racer.site> 1 sibling, 2 replies; 137+ messages in thread From: Brian Gernhardt @ 2007-07-19 16:34 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote: > Alas, we already have such a marker. It is called ".gitignore", > and has > been ignored by _you_. There is _nothing_ wrong, from a technical > standpoint, to call this marker ".gitignore", and it is _also_ not > wrong > to put this marker into the file system _in addition_ to the index. > > So go and add your directories via that marker, and _be done with it_. But this alters the content of the directory away from what I want it to be, namely empty. You aren't addressing the concept of tracking an empty directory, you're just saying you won't do it. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 16:34 ` Brian Gernhardt @ 2007-07-19 17:30 ` Johannes Schindelin [not found] ` <Pine.LNX.4.64.070719 1829530.14781@racer.site> 1 sibling, 0 replies; 137+ messages in thread From: Johannes Schindelin @ 2007-07-19 17:30 UTC (permalink / raw) To: Brian Gernhardt; +Cc: git Hi, On Thu, 19 Jul 2007, Brian Gernhardt wrote: > On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote: > > > Alas, we already have such a marker. It is called ".gitignore", and has > > been ignored by _you_. There is _nothing_ wrong, from a technical > > standpoint, to call this marker ".gitignore", and it is _also_ not wrong > > to put this marker into the file system _in addition_ to the index. > > > > So go and add your directories via that marker, and _be done with it_. > > But this alters the content of the directory away from what I want it to be, > namely empty. You aren't addressing the concept of tracking an empty > directory, you're just saying you won't do it. OMG last time I checked, my _empty_ directory contained "." and "..". What do I do now? Really, Dscho ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <Pine.LNX.4.64.070719 1829530.14781@racer.site>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <Pine.LNX.4.64.070719 1829530.14781@racer.site> @ 2007-07-19 17:47 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 17:47 UTC (permalink / raw) To: git; +Cc: Johannes Schindelin Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Thu, 19 Jul 2007, Brian Gernhardt wrote: > >> On Jul 19, 2007, at 12:17 PM, Johannes Schindelin wrote: >> >> > Alas, we already have such a marker. It is called ".gitignore", and has >> > been ignored by _you_. There is _nothing_ wrong, from a technical >> > standpoint, to call this marker ".gitignore", and it is _also_ not wrong >> > to put this marker into the file system _in addition_ to the index. >> > >> > So go and add your directories via that marker, and _be done with it_. >> >> But this alters the content of the directory away from what I want it to be, >> namely empty. You aren't addressing the concept of tracking an empty >> directory, you're just saying you won't do it. > > OMG last time I checked, my _empty_ directory contained "." and "..". > What do I do now? If you have a suitable Solaris system, you could try sudo unlink . sudo unlink .. and have a chance that this will work until the next file system check. I don't think that adding tracking of ".." would be easy to implement in git, but I seem to remember that somebody recently proposed a plan of at least tracking "." which would seem better than nothing and possibly more useful than "sudo unlink .". All the best, -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:43 ` Johannes Schindelin 2007-07-19 16:06 ` Brian Gernhardt @ 2007-07-19 16:17 ` Matthieu Moy 2007-07-19 16:21 ` David Kastrup 2 siblings, 0 replies; 137+ messages in thread From: Matthieu Moy @ 2007-07-19 16:17 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Brian Gernhardt, David Kastrup, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > More and more I get the impression that this thread is just not worth it. > The problem was solved long ago, and all that is talked about here is how > to complicate things. The problem was not _solved_, it was _worked around_. Adding a .gitignore or whatever other file to mean "the directory exists" is clearly a good workaround, but still, you have to use "git-add $dir/.gitignore" where you really _mean_ "git-add $dir/". I can see no reason for the presence of this .gitignore file other than "err, I've put it here because git doesn't manage empty directories". The fact that you need a FAQ entry for that actually shows there is a problem. You don't have a FAQ for "Q: How to I add a file? A: Use git-add file", you shouldn't need a FAQ for "How do I add a directory", it should just work as expected. You claim it "solves" the problem, but have you ever used an importer like git-svn on a project that uses empty directories as placeholders (I do have this problem in daily life because my colleagues still use SVN)? What is the meaning of this .gitignore file the day you export it to anything outside git? If you ignore problems because they have a workaround, then even CVS can be usable. People have been working around CVS's problems for years, and many people are happy with CVS because they didn't realise that solving problems is better than working around them (See the OpenCVS project ...). Fortunately, git doesn't have as many problems to work around as CVS ;-). I'm happy with the answer "it should be done, but not by me, send a patch", and I can't really complain myself since I did not send a patch, but here, you're complaining about someone who actually starts volunteering to solve the problem, which I can't agree with. -- Matthieu ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:43 ` Johannes Schindelin 2007-07-19 16:06 ` Brian Gernhardt 2007-07-19 16:17 ` Matthieu Moy @ 2007-07-19 16:21 ` David Kastrup 2 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 16:21 UTC (permalink / raw) To: git; +Cc: Johannes Schindelin Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Thu, 19 Jul 2007, Brian Gernhardt wrote: > >> >> On Jul 19, 2007, at 8:24 AM, David Kastrup wrote: >> >> > I think that the placeholder name should rather be ".". >> >> For what it's worth, the more this gets discussed, the more I think your >> idea is a good one. > > I do not like it at all. "." already has a very special meaning. It is a > _directory_, no place holder. And this is what it will be under my scheme: a directory. It is just that "directory" is differentiated from a "tree". Both are tracked in the repository (directory tracking is optional), and there is no such thing as an empty tree, a tree being defined by its contents and nothing else, as previously. A "directory" has no contents, but only existence in index and repository. A "tree" only exists in the repository, not in index or work directory. It is mapped to physical directories in the work directory. If no corresponding "directory" exists in index and/or repository, the work directories are created and deleted on the fly as before in order to represent the state of the "tree" in the repository. So here are the concepts: entity working directory index repository -------------------------------------------------------------- file mapped to files file [blob] dir mapped to dir existence dir [dir] tree mapped to dir tree unrepresented [tree] (non-empty container) > More and more I get the impression that this thread is just not > worth it. The problem was solved long ago, and all that is talked > about here is how to complicate things. I disagree on both accounts: that the problem has been solved (the existence of a workaround involving constant manual intervention is not a solution for me), and that my proposal will constitute a complication to the user. For projects setting a "." into the top level .gitignore, nothing at all will change, even when "core.adddirs: true" will become the default at some point of time. Once this is the default, new users with new projects will not notice anything surprising, at least until the time that they pull from somebody with a repository with different non-explicit conventions. This is something which may still require thought in order to result in the least complicated handling of cooperation. But with regard to the internals itself, I don't see that there is too much non-obvious complexity involved here, and the framework appears very consistent, logical, and compatible with git's ideas to me. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com>]
[parent not found: <863azk78yp.fsf@lola.quinscape.zz>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <863azk78yp.fsf@lola.quinscape.zz> @ 2007-07-19 15:08 ` Brian Gernhardt 2007-07-19 15:27 ` David Kastrup 2007-07-20 0:01 ` Junio C Hamano 0 siblings, 2 replies; 137+ messages in thread From: Brian Gernhardt @ 2007-07-19 15:08 UTC (permalink / raw) To: David Kastrup Cc: Shawn O.Pearce, Junio C Hamano, Linus Torvalds, Matthieu Moy, Johannes Schindelin, Git Mailing List On Jul 19, 2007, at 10:40 AM, David Kastrup wrote: > Have you synched with the current state of my proposals posted to the > mailing list before posting this note? Perhaps your concerns have > already been addressed in them. Mail.app split the thread into two or three pieces. I wrote this after reading the first part, but had missed the rest. I very much like the proposals of separating trees from directories and the "." entries. My apologies for the wasted bandwidth arguing for things that had already been decided. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:08 ` Brian Gernhardt @ 2007-07-19 15:27 ` David Kastrup 2007-07-19 15:50 ` Brian Gernhardt 2007-07-20 0:01 ` Junio C Hamano 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 15:27 UTC (permalink / raw) To: git Brian Gernhardt <benji@silverinsanity.com> writes: > On Jul 19, 2007, at 10:40 AM, David Kastrup wrote: > >> Have you synched with the current state of my proposals posted to the >> mailing list before posting this note? Perhaps your concerns have >> already been addressed in them. > > Mail.app split the thread into two or three pieces. I wrote this > after reading the first part, but had missed the rest. I very much > like the proposals of separating trees from directories and the "." > entries. > > My apologies for the wasted bandwidth arguing for things that had > already been decided. "decided"! Now that's a strong word for my wild brainstorming if I ever heard one, in particular considering my well-near non-existent record of contributions and popularity here: most of the recent "discussion" has been me following up on myself. Anyway, thanks for the heads-up: very much appreciated. I'll probably badly need it when people in Pacific Standard Time get to work again and tear me to pieces. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:27 ` David Kastrup @ 2007-07-19 15:50 ` Brian Gernhardt 0 siblings, 0 replies; 137+ messages in thread From: Brian Gernhardt @ 2007-07-19 15:50 UTC (permalink / raw) To: David Kastrup; +Cc: git On Jul 19, 2007, at 11:27 AM, David Kastrup wrote: > Brian Gernhardt <benji@silverinsanity.com> writes: > >> My apologies for the wasted bandwidth arguing for things that had >> already been decided. > > "decided"! Now that's a strong word for my wild brainstorming if I > ever heard one, in particular considering my well-near non-existent > record of contributions and popularity here: most of the recent > "discussion" has been me following up on myself. Meh. I suppose I meant "talked about" or "brought up" here. Trying to be quick and terse, and ended up losing meaning like usual. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 15:08 ` Brian Gernhardt 2007-07-19 15:27 ` David Kastrup @ 2007-07-20 0:01 ` Junio C Hamano 2007-07-20 0:15 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: Junio C Hamano @ 2007-07-20 0:01 UTC (permalink / raw) To: Brian Gernhardt Cc: David Kastrup, Shawn O.Pearce, Linus Torvalds, Matthieu Moy, Johannes Schindelin, Git Mailing List Brian Gernhardt <benji@silverinsanity.com> writes: > My apologies for the wasted bandwidth arguing for things that had > already been decided. Sorry, who decided what? ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 0:01 ` Junio C Hamano @ 2007-07-20 0:15 ` Linus Torvalds 2007-07-20 0:33 ` Linus Torvalds 2007-07-20 10:19 ` Olivier Galibert 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 0:15 UTC (permalink / raw) To: Junio C Hamano Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List On Thu, 19 Jul 2007, Junio C Hamano wrote: > > Brian Gernhardt <benji@silverinsanity.com> writes: > > > My apologies for the wasted bandwidth arguing for things that had > > already been decided. > > Sorry, who decided what? I think people who didn't know how the world works decided that directories that were added manually as directories would stay as directories even after the last file was removed. That's physically impossible with the git data-structures (since there is no way of saving "this directory was added empty" in the tree structures, nor any point to it), so I think it's just insane rambling. I dunno. I think empty directories are worth supporting, mainly to be able to capture other SCM's notion of what _they_ track, but quite frankly, the level of discussion about them hasn't been exactly inspiring. It seems to be more about "this is what we'd like to see, without really having a reason for it, nor necessarily understanding what we're talking about" than "this is realistic and useful and here are patches". I *do* think that it's a very valid argument that if you import something from SVN that has an empty directory, the git import should show that. That's about the only valid argument I've ever seen for them, though, and I think that's totally irrelevant to such issues as to whether "git rm file/in/directory" should remove the directory(*) from being tracked by git when the file goes away or not. Linus (*) And, for anybody confused about the issue, the answer to the latter question is an emphatic: "Yes it should, live with it, and if you want the directory back, you had better add it back as an empty directory" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 0:15 ` Linus Torvalds @ 2007-07-20 0:33 ` Linus Torvalds 2007-07-20 2:24 ` Junio C Hamano ` (2 more replies) 2007-07-20 10:19 ` Olivier Galibert 1 sibling, 3 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 0:33 UTC (permalink / raw) To: Junio C Hamano Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List On Thu, 19 Jul 2007, Linus Torvalds wrote: > > That's physically impossible with the git data-structures (since there is > no way of saving "this directory was added empty" in the tree structures, > nor any point to it), so I think it's just insane rambling. Of course, it's physically *possible* to have a tree that contains two entries for the same name: first the "empty tree" and then the "real tree", and yeah, in theory you could track things that way. So I guess the "physically impossible" was a bit strong. You'd have to have a totally insane format, and you'd have to violate deeply seated rules about what trees look like (and the index too, for that matter: we'd have to do the same for the index, and keep the S_IFDIR entry alive despite having other entries that are children of it), but it's *possible*. It's just a really bad idea. So to be sane, when you add files, the empty directory entry has to go away. Otherwise you could have two very different trees that encode the same *content* (just with different ways of getting there - depending on whether you have a history with empty trees or not), and that's very much against the philosophy of git, and breaks some fundamental rules (like the fact that "same content == same SHA1"). In fact, that may be the best way to explain why it's *not* an option to have "empty trees remain empty trees if we remove the last file from them": git fundamnetally tracks "content snapshots", and anything that implies the content containing any history is against the rules. So the whole notion of "remembering" whether a directory was added explicitly as an empty directory or not is just not a sensible concept in git. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 0:33 ` Linus Torvalds @ 2007-07-20 2:24 ` Junio C Hamano 2007-07-20 2:31 ` Linus Torvalds 2007-07-20 5:58 ` David Kastrup 2007-07-20 5:35 ` David Kastrup [not found] ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net> 2 siblings, 2 replies; 137+ messages in thread From: Junio C Hamano @ 2007-07-20 2:24 UTC (permalink / raw) To: Linus Torvalds Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > So the whole notion of "remembering" whether a directory was added > explicitly as an empty directory or not is just not a sensible concept in > git. That is true if it is implemented as David suggested, to have a phony "." entry in the tree object itself. The object name of such a tree (when it contains blobs and trees underneath) will be different from a tree that contains the same set of blobs and trees. It would destroy the fundamental concepts of git. But you _could_ treat that "should-be-kept-even-when-empty"-ness just like we treat executable bit on blobs, I think. When blobs with the same contents but of different type (REG vs LNK) and regular file with or without executable bit are entered in git, they all get the same SHA-1 but we can still tell them apart because the index and the tree entry have mode bits. So hypothetically, you could introduce "sticky" directory in tree entries to mark "this will not go away when emptied". In a 'tree' object, they might appear as: 40000 ordinary-directory '\0' 20-byte SHA-1 41000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1 In 'index', as your "I'm soft" patch, we do not have to add nonsticky kind of tree nodes, but for "empty" ones, we can add: 041000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1 in the index and (unlike your patch) keep it there even after a blob or a tree is added underneath it. The "sticky" bit on such a directory would have to obey the usual rule of 3-way merge, which would be a huge change to do so, but I do no see there is anything fundamental that prevents you from doing this. Other than the fact that probably no git long timer is interested in spending time on such a feature, that is. Obviously, this "sticky" bit will cascade up and make your otherwise equivalent parent tree's different, but I think that is just as a sane behaviour as two trees that contain the same blob with only executable-bit differences have different names. This will involve a lot of changes, so I would not recommend anybody doing so, though. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 2:24 ` Junio C Hamano @ 2007-07-20 2:31 ` Linus Torvalds 2007-07-20 5:55 ` David Kastrup 2007-07-20 5:58 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 2:31 UTC (permalink / raw) To: Junio C Hamano Cc: Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List On Thu, 19 Jul 2007, Junio C Hamano wrote: > > But you _could_ treat that "should-be-kept-even-when-empty"-ness > just like we treat executable bit on blobs, I think. True. Or you could make it a path attribute and/or a per-repository decision, so that while the data wouldn't necessarily be in the database itself, the user could specify the behaviour he wanted. > This will involve a lot of changes, so I would not recommend > anybody doing so, though. Agreed. The upside just isn't there. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 2:31 ` Linus Torvalds @ 2007-07-20 5:55 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-20 5:55 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 19 Jul 2007, Junio C Hamano wrote: >> >> But you _could_ treat that "should-be-kept-even-when-empty"-ness >> just like we treat executable bit on blobs, I think. > > True. Or you could make it a path attribute and/or a per-repository > decision, so that while the data wouldn't necessarily be in the > database itself, the user could specify the behaviour he wanted. No, one can't. Once can decide per repository whether one wants to permit this kind of information in. But if one does, the information needs to there for _every_ tree. And a "." entry is a natural and intuitive way to do that. "." has been used as a directory entry for decades in Unix. >> This will involve a lot of changes, so I would not recommend >> anybody doing so, though. > > Agreed. The upside just isn't there. It is a good thing that you did not design the Unix file systems. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 2:24 ` Junio C Hamano 2007-07-20 2:31 ` Linus Torvalds @ 2007-07-20 5:58 ` David Kastrup 2007-07-20 15:31 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-20 5:58 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Brian Gernhardt, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> So the whole notion of "remembering" whether a directory was added >> explicitly as an empty directory or not is just not a sensible concept in >> git. > > That is true if it is implemented as David suggested, to have a > phony "." entry in the tree object itself. Unix file systems contain a phony "." entry in the directory itself, and have survived in spite of this. > The object name of such a tree (when it contains blobs and trees > underneath) will be different from a tree that contains the same set > of blobs and trees. It would destroy the fundamental concepts of > git. Like "." destroyed the fundamental concepts of Unix filesystems. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 5:58 ` David Kastrup @ 2007-07-20 15:31 ` Linus Torvalds 0 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 15:31 UTC (permalink / raw) To: David Kastrup Cc: Junio C Hamano, Brian Gernhardt, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List On Fri, 20 Jul 2007, David Kastrup wrote: > > Like "." destroyed the fundamental concepts of Unix filesystems. David, I'd suggest you just be quiet and learn, instead of spouting idiotic nonsense. When Junio talks about fundamental concepts of git, you should sit back, relax, and ponder. And maybe realize that the git filesystem isn't a "unix filesystem". It's a content-addressable one, it's not POSIX, and yes, it really does have totally different fundamental concepts. So your arguments are just inane and stupid, and show that you aren't worth discussing with, because you don't even understand what you are talking about. So here's a suggestion: how about trying to *understand* git first. After that, you can talk. In fact, at this point, I have an even better suggestion: how about you just shut the hell up until you have a tested patch? Code talks, bullshit walks. And right now you are nothing but bullshit. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 0:33 ` Linus Torvalds 2007-07-20 2:24 ` Junio C Hamano @ 2007-07-20 5:35 ` David Kastrup 2007-07-20 9:27 ` Simon 'corecode' Schubert [not found] ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net> 2 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-20 5:35 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 19 Jul 2007, Linus Torvalds wrote: >> >> That's physically impossible with the git data-structures (since >> there is no way of saving "this directory was added empty" in the >> tree structures, nor any point to it), so I think it's just insane >> rambling. > > Of course, it's physically *possible* to have a tree that contains > two entries for the same name: first the "empty tree" and then the > "real tree", and yeah, in theory you could track things that way. > > So I guess the "physically impossible" was a bit strong. You'd have > to have a totally insane format, and you'd have to violate deeply > seated rules about what trees look like (and the index too, for that > matter: we'd have to do the same for the index, and keep the S_IFDIR > entry alive despite having other entries that are children of it), > but it's *possible*. Excuse me? You don't need a "totally insane format". You need an entry "." of a new type "directory" that can be part of the current concept of a "tree". This new type does _not_ have children. It is not a container for files. It would be the thing that would carry permissions or other properties if git were to store them. It can be put into .gitignore files like other files. One drawback is that adding and removing it alone is not supported with the current git-add and git-remove commands: they would require an additional argument "-d" like "ls" does. All of this is a straightforward extension fitting very well the current paradigms and also existing file systems and their usage. > It's just a really bad idea. > So to be sane, when you add files, the empty directory entry has to > go away. You really have not followed the discussion at all. This is not possible since otherwise you could not distinguish the cases mkdir A touch A/B git-add A git-rm A/B where A was added and not removed and should stay and mkdir A touch A/B git-add A/B git-rm A/B where a single file was added and removed and nothing should stay. > Otherwise you could have two very different trees that encode the > same *content* (just with different ways of getting there - > depending on whether you have a history with empty trees or not), > and that's very much against the philosophy of git, and breaks some > fundamental rules (like the fact that "same content == same SHA1"). No, the content is _different_. One tree contains a tracked directory, the other does not. That means that the trees behave _differently_ when you manipulate them, and that means that they are _not_ the same tree. > In fact, that may be the best way to explain why it's *not* an > option to have "empty trees remain empty trees if we remove the last > file from them": git fundamnetally tracks "content snapshots", and > anything that implies the content containing any history is against > the rules. > > So the whole notion of "remembering" whether a directory was added > explicitly as an empty directory or not is just not a sensible > concept in git. Certainly. That is why we instead remember whether or not a directory entry "." was added or not. It will be added (unless the defaults and gitignore settings ask "." to be non-tracked) when git adds the corresponding tree or subtree, and it will get removed when git removes the corresponding tree or subtree. Emptiness is not a special case, and it can't be. Currently, the main information associated with "." is "stay around even if tree becomes empty". Now you can do unlink . in Solaris and have the name "." vanish while the directory still works as a container by other names. I don't propose that git be able to track this difference, though, and I doubt that most file archivers would. But git can or cannot ignore files, and in a similar way it can or cannot ignore what a directory has more than being an abstract container. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 5:35 ` David Kastrup @ 2007-07-20 9:27 ` Simon 'corecode' Schubert 2007-07-20 10:11 ` David Kastrup 2007-07-20 10:34 ` Junio C Hamano 0 siblings, 2 replies; 137+ messages in thread From: Simon 'corecode' Schubert @ 2007-07-20 9:27 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup wrote: >> Otherwise you could have two very different trees that encode the >> same *content* (just with different ways of getting there - >> depending on whether you have a history with empty trees or not), >> and that's very much against the philosophy of git, and breaks some >> fundamental rules (like the fact that "same content == same SHA1"). > > No, the content is _different_. One tree contains a tracked > directory, the other does not. That means that the trees behave > _differently_ when you manipulate them, and that means that they are > _not_ the same tree. You are mistaking things. Like the executable bit on a file is not content, the fact that a directory should be kept despite being empty is also an *attribute* of the directory. This is meta-data, not actual data (content). So no matter how elegant tracking the "." entry might be (and I think it is, because it covers a lot of corner cases already), it puts the information at the wrong place. That's sad, because otherwise it would be really elegant. cheers simon -- Serve - BSD +++ RENT this banner advert +++ ASCII Ribbon /"\ Work - Mac +++ space for low €€€ NOW!1 +++ Campaign \ / Party Enjoy Relax | http://dragonflybsd.org Against HTML \ Dude 2c 2 the max ! http://golden-apple.biz Mail + News / \ ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 9:27 ` Simon 'corecode' Schubert @ 2007-07-20 10:11 ` David Kastrup 2007-07-20 10:34 ` Junio C Hamano 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-20 10:11 UTC (permalink / raw) To: git Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes: > David Kastrup wrote: >>> Otherwise you could have two very different trees that encode the >>> same *content* (just with different ways of getting there - >>> depending on whether you have a history with empty trees or not), >>> and that's very much against the philosophy of git, and breaks some >>> fundamental rules (like the fact that "same content == same SHA1"). >> >> No, the content is _different_. One tree contains a tracked >> directory, the other does not. That means that the trees behave >> _differently_ when you manipulate them, and that means that they are >> _not_ the same tree. > > You are mistaking things. No, I am redefining them, or rather the view on them. Subtle difference. > Like the executable bit on a file is not content, the fact that a > directory should be kept despite being empty is also an *attribute* > of the directory. This is meta-data, not actual data (content). We need to track it, anyway. So there is little point in not using the existing infrastructure for handling named entities. > So no matter how elegant tracking the "." entry might be (and I > think it is, because it covers a lot of corner cases already), it > puts the information at the wrong place. I don't see that the place is wrong: after all, that is where Unix places "." too, and for good reason. I was arguing for _separating_ the concept of "directory" and "tree" in the repository. The tree is a container entity defined exclusively by its contents (which determine its hash). That is how git already does things. There is _no_ connection with the physical existence of a directory: in the work directory, git creates and deletes directories as a _side-effect_ of storing and removing trees. But git itself does not track directories as a physical entity at _all_. If you had a flat filesystem allowing slashes in filenames, git would get along better than it does now, without ever creating or removing a directory. Trees are just a convenient selection and pattern matching mechanism for files as far as git is concerned. The correspondence to physical directories in the work directory is a nuisance rather than an asset as far as git is concerned. In a recent thread here, tags with slashes were supported by essentially doing mkdir -p "`dirname $TAG`" touch $TAG where directory creation is just a side effect of supporting slashes. And that, if you look closely, is git's current relation with directories altogether. The directories in the work file system are created by git just as a side effect for representing slashes, which in turn facilitate a certain manner of pattern matching. And "." seems perfectly well suited to bring across the point that there actually is _physical_ existence associated with a directory, existence that remains when the rest of the tree is gone and _makes_ a difference to what the tree is, because it has a _different_ representation in the work file system. Storing it as an _attribute_ of the tree is a bad idea, since then the simple rule "a tree without contents is empty" needs an exception. And a tree stops becoming just a container of its contents and all sort of new exceptions creep up. There are some systems where the difference between directory as a file and directory as a structuring method are more apparent than under Unix (some utilities like rsync differentiate between A/B and A/B/ to bring across that difference). Here is an example for some Emacs function concerned with the concept: directory-file-name is a built-in function in `C source code'. (directory-file-name DIRECTORY) Returns the file name of the directory named DIRECTORY. This is the name of the file that holds the data for the directory DIRECTORY. This operation exists because a directory is also a file, but its name as a directory is different from its name as a file. In Unix-syntax, this function just removes the final slash. On VMS, given a VMS-syntax directory name such as "[X.Y]", it returns a file name such as "[X]Y.DIR.1". [back] > That's sad, because otherwise it would be really elegant. If something is not elegant because of the angle of view, change the view. And it is not like the different angle has no predecessors or no consistency. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 9:27 ` Simon 'corecode' Schubert 2007-07-20 10:11 ` David Kastrup @ 2007-07-20 10:34 ` Junio C Hamano 2007-07-20 13:23 ` David Kastrup 2007-07-20 19:24 ` Linus Torvalds 1 sibling, 2 replies; 137+ messages in thread From: Junio C Hamano @ 2007-07-20 10:34 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: David Kastrup, git Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes: > You are mistaking things. Like the executable bit on a file > is not content, the fact that a directory should be kept > despite being empty is also an *attribute* of the directory. > This is meta-data, not actual data (content). So no matter > how elegant tracking the "." entry might be (and I think it > is, because it covers a lot of corner cases already), it puts > the information at the wrong place. Actually, I do not think there is absolute right or wrong here. The difference is not that the information is at the "right" or "wrong" place, but one approach places the information at more efficient-to-use place than the other. In that sense, the attribute approach _is_ a more elegant solution between the two. Making it an attribute has a huge practical advantage. By treating executable bit as a piece metadata, we can compare the "contents" quickly. If you "chmod +x" a blob without changing anything else, we can detect that fact, because blob object names are equal. At the philosophical level, you _could_ argue that the executable-ness is one bit of content and include that in the object name computation for the blob. There is nothing fundamentally wrong about that approach, but that destroys the nice "cheap comparability" between blobs that differ only by executable-ness. David's "." in tree is essentially the same argument as treating the executable-ness as one extra bit of content. The fact that a particular tree wants to stay even after emptied can be treated as part of contents (thereby reflected in its object name). There is nothing fundamentally wrong there, either. But that means two trees that contain otherwise identical set of blobs and subtrees, but differ only in the behaviour of when they are emptied, would get different object names, hence you need to descend into them to see if they are different. Using attribute that is detached from the content itself allows you to hoist that one bit one level up. By treating executable-ness not as part of content, we can compare two blobs with different executable bits cheaply. You can avoid descending into such a tree when comparing it with another tree that is different only by the "will-stay-when-emptied"-ness the same way. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 10:34 ` Junio C Hamano @ 2007-07-20 13:23 ` David Kastrup 2007-07-20 19:24 ` Linus Torvalds 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-20 13:23 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Actually, I do not think there is absolute right or wrong here. The > difference is not that the information is at the "right" or "wrong" > place, but one approach places the information at more > efficient-to-use place than the other. Agreed. > In that sense, the attribute approach _is_ a more elegant solution > between the two. Disagreed. See below. > Making it an attribute has a huge practical advantage. > > By treating executable bit as a piece metadata, we can compare the > "contents" quickly. If you "chmod +x" a blob without changing > anything else, we can detect that fact, because blob object names > are equal. At the philosophical level, you _could_ argue that the > executable-ness is one bit of content and include that in the object > name computation for the blob. There is nothing fundamentally wrong > about that approach, but that destroys the nice "cheap > comparability" between blobs that differ only by executable-ness. > > David's "." in tree is essentially the same argument as treating the > executable-ness as one extra bit of content. The fact that a > particular tree wants to stay even after emptied can be treated as > part of contents (thereby reflected in its object name). Small nit here: the tree does not want to stay after emptied, since it is not empty as long as it contains ".". > There is nothing fundamentally wrong there, either. But that means > two trees that contain otherwise identical set of blobs and > subtrees, but differ only in the behaviour of when they are emptied, > would get different object names, hence you need to descend into > them to see if they are different. And here we disagree in our assessment, and where I find the example of the execute bit unfitting. We are talking about _trees_ here, not files. So this is only relevant if we have a _huge_, _flat_ tree with _lots_ of entries at _bottom_ level. How often does it occur in practice that a _large_ tree has "." added or removed and nothing else changes? Never, because the normal use case is that a directory is either tracked from the start, or not tracked at all. And even if you change the tracking for a whole project at once (which is a one-time job): the cost difference is looking at all _tree_ leaf entries, not at all the involved files. > Using attribute that is detached from the content itself allows you > to hoist that one bit one level up. By treating executable-ness not > as part of content, we can compare two blobs with different > executable bits cheaply. You can avoid descending into such a tree > when comparing it with another tree that is different only by the > "will-stay-when-emptied"-ness the same way. But changing the executable bit of a file will happen often during development. Adding or removing "." will never usually be done _ever_ except when the tree is first created or removed, and then the cost is negligible. So "performance" is not an issue for making this an attribute or a flat entry. While the user level abstraction need not match the actual representation, I think that it will make for lot less special cases and problematic behavior to pull through with "." as a directory entry that mostly behaves like other files and, like other files, requires git to create a directory to contain it. All the logic for creating and deleting directories and creating and adding and ignoring files can _perfectly_ stay the same. There are just two differences: a) git always sees "." as a file in every directory in the work tree and considers it a file. b) when it comes to actually creating or modifying or reading the actual file in the work directory, it silently skips the operation. It would not even be necessary to give the directory entry any special attributes or permissions to make this scheme work: declaring it a normal file and just special-casing the name "." on those operations would lead to consistent and working behavior, with no change of format in index and repository at all. Possibly even a) alone would suffice, at the cost of letting git complain and continue at every operation (or making a _really_ royal mess for Solaris root users). I might be tempted to make a proof-of-concept patch for that. But for backward-compatibility, it will be better to use an entry type which old versions of git will be able to ignore when checking out or in. And for user-friendliness, one does not really want to list such entries as regular files. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 10:34 ` Junio C Hamano 2007-07-20 13:23 ` David Kastrup @ 2007-07-20 19:24 ` Linus Torvalds 2007-07-20 21:02 ` Johan Herland 1 sibling, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 19:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Simon 'corecode' Schubert, David Kastrup, git On Fri, 20 Jul 2007, Junio C Hamano wrote: > > Using attribute that is detached from the content itself allows > you to hoist that one bit one level up. By treating > executable-ness not as part of content, we can compare two blobs > with different executable bits cheaply. You can avoid > descending into such a tree when comparing it with another tree > that is different only by the "will-stay-when-emptied"-ness the > same way. Having thought about it a bit more, I would absolutely *detest* any kind of "executable bit" like behaviour. Why? Merging. I think one of the fundamental issues in merging is that you do it "in the working tree". This is something that pretty much *everybody* else gets wrong, and it's somethign where git absolutely shines. But git shines here exactly because git never tracks "history" or the state in the tree, and only ever tracks things that are indubitably real content. Which is why you never *ever* have to tell git about "I moved file X to file Y" - because git only tracks things that it can see right in front of it, in the tree. The "sticky directory" bit simply would not be something like that. It simply isn't "content", and as such, it should not be tracked. It's as easy as that. We don't want a merge of two branches to have to specify any extra data "outside" the tree as to how it should be merged. So the issue about whether a directory *exists* or not can be merged (just look at the tree), but the issue about whether the directory is supposed to be sticky is something that you'd have to tell git about *outside* of the tree, and that violates the whole point of working tree merges. I do realize that if you use inferior operating systems, we already have these kinds of "outside the tree" data entries, thanks to issues like symlinks and normal file executable bits that you would have to explicitly tell git about when you're working in a broken environment. So in that sense, it wouldn't be anything technically new for git. But that doesn't change the fundamental issue: the limitation with executable bits and symlinks is a limitation of the broken environment, not of git. But "directories stay around after the last file is gone" is not that, it would simply be a design mistake in git itself. There are other reasons to not do it. What about file renames? Maybe the directory got *renamed*. From a pure content angle, this is "all the files in that directory went away". If you have stupid rules like "directories stay around even though all the files went away", you would again have problems with this common case. In other words: I don't care one whit about the whiners. What's MUCH more important than some random whiny person saying "Daddy, daddy, I want a pony" is whether you can afford to maintain that pony in the future. And this pony is just stupid. So here: No, you cannot have a pony. NOT YOURS. but I still think we should support the concept of importing things from other systems, and thus eventually support empty directories. Just not any crazy semantics with sticky histories. Linus PS. As usual, per-user or per-repository *local* attributes are something else. They aren't "sticky history", they are just purely behavioural defaults. Those kinds of things may make sense. But that's not a "tracking content" issue. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 19:24 ` Linus Torvalds @ 2007-07-20 21:02 ` Johan Herland 2007-07-20 21:48 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Johan Herland @ 2007-07-20 21:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: git On Friday 20 July 2007, Linus Torvalds wrote: > [...] > > But that doesn't change the fundamental issue: the limitation with > executable bits and symlinks is a limitation of the broken environment, > not of git. But "directories stay around after the last file is gone" is > not that, it would simply be a design mistake in git itself. > > There are other reasons to not do it. What about file renames? Maybe the > directory got *renamed*. From a pure content angle, this is "all the files > in that directory went away". If you have stupid rules like "directories > stay around even though all the files went away", you would again have > problems with this common case. > > In other words: I don't care one whit about the whiners. What's MUCH more > important than some random whiny person saying "Daddy, daddy, I want a > pony" is whether you can afford to maintain that pony in the future. And > this pony is just stupid. > > So here: > > No, you cannot have a pony. NOT YOURS. > > but I still think we should support the concept of importing things from > other systems, and thus eventually support empty directories. Just not any > crazy semantics with sticky histories. Does this mean that you are firmly opposed to the concept of storing directories in the index/tree as such, or that you are only opposed to (some of) the implementation ideas that have been discussed so far? If the former is the case, does this mean that there will be no support for empty directories in git, alternatively that such support is limited to incorporating e.g. Dscho's .gitignore workaround into porcelain commands (i.e. "git add --directory some_dir" will be mangled/transformed into "touch some_dir/.gitignore && git add some_dir/.gitignore")? (Granted, Dscho's .gitignore workaround is fairly elegant as workarounds go, but it still reeks of inheriting a CVS misfeature.) Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 21:02 ` Johan Herland @ 2007-07-20 21:48 ` Linus Torvalds 2007-07-20 22:36 ` Julian Phillips 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-20 21:48 UTC (permalink / raw) To: Johan Herland; +Cc: git On Fri, 20 Jul 2007, Johan Herland wrote: > > Does this mean that you are firmly opposed to the concept of storing > directories in the index/tree as such, or that you are only opposed to > (some of) the implementation ideas that have been discussed so far? I've already sent out a *patch* to do so, for chissake. It handled all these cases perfectly fine, as far as I know, but I didn't test it all that deeply (and made it clear when I sent that patch out). In fact, in this whole pointless discussion, I think I'm so far the only one to have done anything constructive at all. Sad. So here's my standpoint: - people who use git natively might as well use the ".gitignore" trick. It really *does* work, and there really aren't any downsides. Those directories will stay around forever, until you decide that you don't want them any more. Problem solved. Sure, if you export the git archive into some other format, you might well want to do something about the ".gitignore" files (like just delete them, since they won't be meaningful in an SVN environment, for example, but you might also just convert them into SVN's "attributes" or whatever it is that SVN uses to ignore files). - If you don't use git natively, but just to track another thing, you could easily use the patches that I already sent out. Yes, they need more testing. Yes, you'd also probably like some user interface updates (notably "git add/rm" should be taught about directories). And yes, I probably (almost certainly) didn't handle all cases, but the patch I sent out was actually a working one. It really *did* pass my trivial tests. But once you start tracking empty directories *without* a .gitignore file, some things fall out of that: - git really *really* is designed to track "snapshots in time". You generate history from these snapshots. This is a very fundmanetal issue, and a lot of people seem to have trouble understanding the deeper implications. For example, git and hg may look similar, but git tracks "snapshots in time", and hg tracks "file histories tied together in snapshots". That really is a fundamentally different thing. And one of the fundamental results of git's approach is that content is content. There is *never* any notion of "history". A snapshot really is just that: it's a standalone thing. It *has* no history. The history comes entirely from outside. This means that the whole notion of "this directory will not go away because I added it explicitly" is a totally broken notion in git. It has a notion of "history" - something that simply DOES NOT EXIST, unless you seriously break the whole notion of "snapshots in time". In other words, when I say that git is a "content tracker", I'm serious. It tracks nothing *but* content. If some concept doesn't exist in the working tree, git doesn't track it. If it cannot be seen in the filesystem, it doesn't exist. - Contrast this with a lot of totally broken SCM's, that track "history" of files. As a result, they have absolutely *horrid* merge problems, because you can no longer just merge things in the working directory, and "the result" is the result. No, if you track history, you now have to tell the SCM about how the *history* moved, not just the content. So this is why git MUST NOT make the difference between - a directory was was created explicitly and then had a few files added to it, and then had those files deleted from it and - we added a few files, we removed them The end result MUST BE the same, because the state IN THE WORKING TREE is the same! If the contents are the same, the end result must be the same. It's that simple. And it all comes down to: "git tracks contents". Now, having said that, it doesn't matter *what* the end result is, as long as it's the same for both cases. What we do now is that when the files go away, the directory is no longer tracked. But we *could* say that when we remove files, we always add back the directory they were in if that directory still exists in the filesystem. See? Both are consistent with the "git tracks contents" notion. The only thing that is *not* consistent with that notion is to have a flag that we carry along that says "keep this directory". That's no longer content, and now you'd be tracking some internal SCM history instead. And that is a mistake. It may sound like a small mistake (and it is), but down that path lies madness. It's much better to teach people _why_ git doesn't do it, than to say "ok, git tracks content, but we have this special case where we also track something else, namely a git internal "stickiness" notion". SCM is too important to play games with. Git gets things right, and I doubt people really _realize_ that the "tracks content" is why git is so much better, and why git can do merges so much faster and more reliably than anybody else. So the rule really *must* be: - if two trees look the same in the filesystem, they *must* have the same git SHA1, because by definition, they have the same content. Anything that breaks that very simple statement is fundamentally broken. Linus PS. I realize that nobody actually seems to be writing code, and that this is a "paint the bike shed" discussion for everybody else, but just in case there are people who don't just masturbate about the color of the shed, I'd like to point out that we really *do* need to enhance the "diff" rules too, so that you can express the changes in a tree as a diff too. Because if we track empty directories, then we need to be able to also *show* the difference between a tree that has an empty directory, and one that does not. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 21:48 ` Linus Torvalds @ 2007-07-20 22:36 ` Julian Phillips 2007-07-21 0:18 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Julian Phillips @ 2007-07-20 22:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johan Herland, git On Fri, 20 Jul 2007, Linus Torvalds wrote: > > > On Fri, 20 Jul 2007, Johan Herland wrote: >> >> Does this mean that you are firmly opposed to the concept of storing >> directories in the index/tree as such, or that you are only opposed to >> (some of) the implementation ideas that have been discussed so far? > > I've already sent out a *patch* to do so, for chissake. It handled all > these cases perfectly fine, as far as I know, but I didn't test it all > that deeply (and made it clear when I sent that patch out). > > In fact, in this whole pointless discussion, I think I'm so far the only > one to have done anything constructive at all. Sad. There was Dscho's .gitignore based patch too ... > > So here's my standpoint: > > - people who use git natively might as well use the ".gitignore" trick. > It really *does* work, and there really aren't any downsides. Those > directories will stay around forever, until you decide that you don't > want them any more. Problem solved. > > Sure, if you export the git archive into some other format, you might > well want to do something about the ".gitignore" files (like just > delete them, since they won't be meaningful in an SVN environment, for > example, but you might also just convert them into SVN's "attributes" > or whatever it is that SVN uses to ignore files). Personally I quite like this approach - I'm going to use it to keep all the empty directories from Subversion in my importer. It seems to address everthing quite neatly. I don't really understand the objections ... especially since I can't see why you want an empty directory if you're not going to put _something_ in it - in which case, presumably you want to ignore it (so maybe a .gitignore containing * would be better than an empty one)? However, I'm sure that if people want it, they have a reason. > SCM is too important to play games with. Git gets things right, and I > doubt people really _realize_ that the "tracks content" is why git is so > much better, and why git can do merges so much faster and more reliably > than anybody else. This is the thing that made me interested in git back in April '05. I couldn't see what we were going to end up with at that point - but I was _convinced_ that due to the underlying design it was worth watching. Being a python type (sorry ... :$) hg looked interesting when it sprang up - but they threw away what I considered to be one of the most compelling features of git (at the time there wasn't the wealth of really nice tools that we now have). In fact, I really should say "Thank you Linus", since I came that close to writing an SCM from scratch myself - having been using Subversion with branches for quite some time (and CVS before that - and yes I do mean branches + CVS). Now I no longer feel the need to write an SCM - just a longing to use git. git is probably better than anything I would have come up with too. :D -- Julian --- She is descended from a long line that her mother listened to. -- Gypsy Rose Lee ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 22:36 ` Julian Phillips @ 2007-07-21 0:18 ` Linus Torvalds 2007-07-21 1:23 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-21 0:18 UTC (permalink / raw) To: Julian Phillips; +Cc: Johan Herland, git On Fri, 20 Jul 2007, Julian Phillips wrote: > On Fri, 20 Jul 2007, Linus Torvalds wrote: > > > > So here's my standpoint: > > > > - people who use git natively might as well use the ".gitignore" trick. > > It really *does* work, and there really aren't any downsides. Those > > directories will stay around forever, until you decide that you don't > > want them any more. Problem solved. > > Personally I quite like this approach - I'm going to use it to keep all the > empty directories from Subversion in my importer. It seems to address > everthing quite neatly. The really sad part about this discussion is that the ".gitignore trick" is really technically no different at all from the one that David Kastrup has been advocating a few times, except he calls his ".gitignore" just ".", and seems to think that it's somehow different. It is true that ".gitignore" and "." _are_ different. But they are actually different in the sense that the ".gitignore" thing is something you can control, while the "." thing is something that is in all directories on UNIX, which is exactly why it _must_not_ be used by git to mark existence. Exactly because it has thus lost its ability to be something you can tune per-directory in the working tree! That said, I actually like my patch, because the git tree structures actually lend themselves very naturally to the "empty tree", and I know people have even built up those kinds of trees on purpose, even if the index doesn't support that notion. So in that sense, teaching the index about an empty tree is in some ways the "right thing" to do, if only because it means that the index can finally express something that the tree objects themselves have always been able to validly encode. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 0:18 ` Linus Torvalds @ 2007-07-21 1:23 ` David Kastrup 2007-07-21 3:54 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-21 1:23 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > The really sad part about this discussion is that the ".gitignore > trick" is really technically no different at all from the one that > David Kastrup has been advocating a few times, except he calls his > ".gitignore" just ".", and seems to think that it's somehow > different. Oh no, I don't think at all that it is somehow different: actually this is _exactly_ the reason why I think that the implementation will be doable even by an idiot like myself, and that is because at least in my first iteration, "." will appear as an empty regular file to git, just like ".gitignore". The main worry I had was that putting "." inside of a gitignore entry might stop "git add ." from working like previously. But I tried it, and it works just like it would with ".gitignore". Or rather like it would with ".notignore" since ".gitignore" _is_ specially treated by git, after all. > It is true that ".gitignore" and "." _are_ different. > > But they are actually different in the sense that the ".gitignore" > thing is something you can control, while the "." thing is something > that is in all directories on UNIX, which is exactly why it > _must_not_ be used by git to mark existence. But I don't plan to have it used by git to mark existence. The _existence_ can be taken for granted. But what can't be taken for granted, like with any other file, is that the file is actually being tracked by git. To have it tracked, you need to add it, and it must not be covered by gitignore. > Exactly because it has thus lost its ability to be something you can > tune per-directory in the working tree! But it should not let the user lose his ability to let or let not git track the file. > That said, I actually like my patch, because the git tree structures > actually lend themselves very naturally to the "empty tree", and I > know people have even built up those kinds of trees on purpose, even > if the index doesn't support that notion. And that is the reason I will be working with the "empty file ." metaphor: it would be way above my head to make the index support new file types or even structures, and change the evaporate-when-empty semantics of trees and so on, while catching all special cases. I have no chance in hell to implement a new feature with a reasonable amount of time and work. That's a task for people with a larger brain than mine who have my full admiration and respect. The best I can hope to achieve is a clever hack. And if that works, people can still pile exceptions on it and redo it as a "proper feature". You are _perfectly_ correct that my proposal is _not_ a jot different from registering a regular empty file ".notignore", and it is on _purpose_, because I could not handle the complications if it were. The only difference is that I am calling the file ".". Which is in _all_ respects nothing more than a naming convention. However, this convention has distinct advantages over ".notignore": a) I don't have to depart as far from reality. Whenever I try registering ".", I can rely on the work directory actually _having_ "." as a _real_, not a pseudofile. It will not actually be a _regular_ file as I'll tell git: that's a wart of my prototype implementation which will, no doubt, eventually be fixed by others _if_ the code does its job fine apart from being ugly to look at. It may not be even necessary internally to think of "." other than as an empty regular file, but git should probably not talk too loud about it lest people laugh at it. b) it already means something to people. Now this is a two-edged sword, since "almost, but not quite, entirely unlike" concepts are not necessarily helpful in computing. In this case, however, I think the match is close enough to help people understand what is going on rather than the other way round. "." was introduced because people wanted to have a good way to refer to a directory as an element of itself. So using "." as a self-reference for a directory is quite in the spirit of that name. > So in that sense, teaching the index about an empty tree is in some > ways the "right thing" to do, if only because it means that the > index can finally express something that the tree objects themselves > have always been able to validly encode. If you define the tree objects by the physical in-memory or in-repository data structures encoding them, then you are correct. I am somewhat reluctant to parade around another red cape, but in this particular case, the size of the wet spot in my pants does not as much relate to the physical layout of the data structure (big deal, probably 30 lines of code all around), but rather to the extent and assumptions of functions accessing it. Namely, data layout and accessor functions _together_ constitute a tree object. So for me the "evaporate-when-empty" property, while not inherent in the physical layout of the object, is still an inherent part of its structure which I would not want to touch: finding and fixing and debugging all code elements which explicitly or implicitly rely on that assumption is something I would not entrust myself with. I might have been more inclined to dabble with that approach if the tree stuff were written in something more object-oriented, say, clean and concise C++, except that clean and concise C++ code in the wild is even more of a mythical beast than clean and concise TeX code, and C++ itself is such a mindboggingly complex contraption... I digress. All the best, -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 1:23 ` David Kastrup @ 2007-07-21 3:54 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-21 3:54 UTC (permalink / raw) To: git David Kastrup <dak@gnu.org> writes: > The only difference is that I am calling the file ".". Which is in > _all_ respects nothing more than a naming convention. > > However, this convention has distinct advantages over ".notignore": > > a) I don't have to depart as far from reality. Whenever I try > registering ".", I can rely on the work directory actually _having_ > "." as a _real_, not a pseudofile. It will not actually be a > _regular_ file as I'll tell git: that's a wart of my prototype > implementation which will, no doubt, eventually be fixed by others > _if_ the code does its job fine apart from being ugly to look at. Update: well, I am still digging through the code, but this is all so well factored that it might be perfectly feasible to have S_ISDIR entries after all without too much of a hassle. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net> @ 2007-07-20 5:53 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-20 5:53 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> So the whole notion of "remembering" whether a directory was added >> explicitly as an empty directory or not is just not a sensible concept in >> git. > > That is true if it is implemented as David suggested, to have a > phony "." entry in the tree object itself. The object name of such > a tree (when it contains blobs and trees underneath) will be > different from a tree that contains the same set of blobs and trees. > It would destroy the fundamental concepts of git. How so? > But you _could_ treat that "should-be-kept-even-when-empty"-ness > just like we treat executable bit on blobs, I think. > > When blobs with the same contents but of different type (REG vs LNK) > and regular file with or without executable bit are entered in git, > they all get the same SHA-1 but we can still tell them apart because > the index and the tree entry have mode bits. So hypothetically, you > could introduce "sticky" directory in tree entries to mark "this > will not go away when emptied". A tree containing files with and without executable bits will show different SHA-1 sums. There is no reason that this should be different for a tree containing the conceptual "." or not. I won't fight for a specific implementation but if I am going to implement this (and the current lack of enthusiasm points to that) I will not go and duplicate the entire ignore/add/rm/index/repository machinery in order to have a bit rather than an actual "." directory entry. Most Unix file systems have an honest, physical, down-to-Earth directory entry "." even on disk because it _simplifies_ matters, even though one could special-case "." all throughout and make do without a physical entry in theory. And, as I explained, "." lends itself perfectly to the gitignore machinery in order to policy projects to track or not track directories. > In a 'tree' object, they might appear as: > > 40000 ordinary-directory '\0' 20-byte SHA-1 > 41000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1 > > In 'index', as your "I'm soft" patch, we do not have to add > nonsticky kind of tree nodes, It does not work, since then you can't distinguish mkdir A touch B git-add A/B from mkdir A touch B git-add A It is very clear that git-rm A/B _mustn't_ leave an empty directory in the first case, and _must_ leave an empty directory in the second case _if_ and only if one tracks directories. > Obviously, this "sticky" bit will cascade up and make your otherwise > equivalent parent tree's different, No, it must not "cascade up". After mkdir -p A/B touch A/B/C git-add A/B git-rm A/B there must be nothing tracked by git. The "sticky" bit does not "cascade up". Its upward effect is only changing the SHA-1 of the tree, like any change below does. > This will involve a lot of changes, so I would not recommend anybody > doing so, though. Neither would I. Why people want to complicate the code base everywhere by avoiding to treat "." like a legitimate entry (as Unix file systems do for a _reason_) is simply a miracle to me. The framework is pretty much _there_. There is no point in not making use of it and duplicating the whole machinery because we want a "bit set" implementation instead of a file name. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-20 0:15 ` Linus Torvalds 2007-07-20 0:33 ` Linus Torvalds @ 2007-07-20 10:19 ` Olivier Galibert 1 sibling, 0 replies; 137+ messages in thread From: Olivier Galibert @ 2007-07-20 10:19 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Brian Gernhardt, David Kastrup, Shawn O.Pearce, Matthieu Moy, Johannes Schindelin, Git Mailing List On Thu, Jul 19, 2007 at 05:15:28PM -0700, Linus Torvalds wrote: > (*) And, for anybody confused about the issue, the answer to the latter > question is an emphatic: "Yes it should, live with it, and if you want the > directory back, you had better add it back as an empty directory" Wouldn't it be perfectly reasonable for git rm to re-add emptied directories as empty transparently if the appropriate flag/configuration is set? rm is porcelain after all. OG. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 5:28 ` Junio C Hamano 2007-07-19 5:38 ` Shawn O. Pearce @ 2007-07-19 5:59 ` David Kastrup 2007-07-19 9:54 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-19 5:59 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Another issue I thought about was what you would do in the step > 3 in the following: > > 1. David says "mkdir D; git add D"; you add S_IFDIR entry in > the index at D; > > 2. David says "date >D/F; git add D/F"; presumably you drop D > from the index (to keep the index more backward compatible) > and add S_IFREG entry at D/F. I don't think that one should drop D here. Operation 1 _is_ not backward compatible, so if you want to revert it, you should explicitly remove D. And we can't "keep" the index backward compatible if it isn't so after step 1. > 3. David says "git rm D/F". > > Have we stopped keeping track of the "empty directory" at this > point? The case I am worrying about is rather mkdir D mkdir D/E touch D/E/file git add D [*] git rm D/E/file >From a user perspective, E should be registered still. Compare this with mkdir D mkdir D/E touch D/E/file git add D/E/file [*] git rm D/E/file Where likely both D and E should now be considered unregistered. So the situation is different between the first or the second [*], and the difference might be impossible to express completely in the frame of a backwards-compatible index, even though we don't track an empty directory at the point [*] at all, and the only registered _file_ is D/E/file. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-19 5:59 ` David Kastrup @ 2007-07-19 9:54 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-19 9:54 UTC (permalink / raw) To: git David Kastrup <dak@gnu.org> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> Another issue I thought about was what you would do in the step >> 3 in the following: >> >> 1. David says "mkdir D; git add D"; you add S_IFDIR entry in >> the index at D; >> >> 2. David says "date >D/F; git add D/F"; presumably you drop D >> from the index (to keep the index more backward compatible) >> and add S_IFREG entry at D/F. > > I don't think that one should drop D here. Operation 1 _is_ not > backward compatible, so if you want to revert it, you should > explicitly remove D. And we can't "keep" the index backward > compatible if it isn't so after step 1. > >> 3. David says "git rm D/F". >> >> Have we stopped keeping track of the "empty directory" at this >> point? > > The case I am worrying about is rather > > mkdir D > mkdir D/E > touch D/E/file > git add D > [*] > git rm D/E/file > > From a user perspective, E should be registered still. Compare this > with > > mkdir D > mkdir D/E > touch D/E/file > git add D/E/file > [*] > git rm D/E/file Let's take this through the motions with my last proposal: at the first [*], the index now contains D/. [dir] D/E/. [dir] D/E/file [file] After git rm D/E/file, it contains D/. [dir] D/E/. [dir] Compared with the second, where we just have in the index D/E/file [file] and it is gone again after the remove. After commiting in the first case, we have in the repository D [tree] D/. [dir] D/E [tree] D/E/. [dir] D/E/file [file] Now we do git rm D/E, and the index contains D/E/. [remove dir] D/E/file [remove file] If we commit now, D/E [tree] becomes empty and is removed. All that stays is D [tree] D/. [dir] So we still have [tree] items only in the repository, not in the index, and there is no such thing as an empty tree. But directories have a presence in index and repository. They are not containers of files, that role is retained by trees. Rather they are siblings of the files in their associated tree. As a note aside: if one wanted to track directory permissions, one would track them in the [dir] entries, not in the [tree] entries. Trees remain abstract structuring entities in the repository that don't have an outside representation. Directories will be auto-created and deleted as necessary in the work directory to facilitate having a place for checking tree elements out and in. This means that git add D/E/file would _not_ track permissions of D and E (nor their existence). However, Linus is right that permissions are something to be discussed separately. But separating [tree] and [dir] makes for a plausible and understandable way of treating them. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> @ 2007-07-22 21:08 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 21:08 UTC (permalink / raw) To: git Well, coming back to this posting in order to focus on some points that were at a level more relevant to the implementation. And I'll go through the questions assuming my permissions-based proposal. Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 19 Jul 2007, David Kastrup wrote: >> >> Well, kudos. Together with the analysis from Junio, this seems like a >> good start. Would you have any recommendations about what stuff one >> should really read in order to get up to scratch about git internals? > > Well, you do need to understand the index. That's where all the new > subtlety happens. > > The data structures themselves are trivial, and we've supported > empty trees (at the top level) from the beginning, so that part is > not anything new. > > However, now having a new entry type in the index (S_IFDIR) means > that anything that interacts with the index needs to think > twice. But a lot of that is just testing what happens, and so the > first thing to do is to have a test-suite. Yes. > There's also the question about how to show an empty tree in a > diff. Well, there are two possibilities involved here, a more and a less chatty one. Assuming that we want to do as little work as possible, the transition between a tracked and a non-tracked directory will be given in one of the following manners: Either: a) xxx: old mode 000000 xxx: new mode 040755 when a directory gets tracked and xxx: new mode 040755 xxx: old mode 000000 when it gets untracked again. or b) xxx: new directory mode 040755 when a directory gets tracked and xxx: deleted directory mode 040755 when it gets untracked again. Note that "new" does not mean that git did not previously have had files that absolutely have required a directory for placing. It just means that it has now actively gained knowledge about the directory. In a similar vein, "deleted" means that git is just deleting its knowledge about the directory, _scheduling_ it for a single deletion attempt at the earliest (and actually also latest) opportunity: when git happens to know about no more files that require keeping the directory around. So perhaps the following would be more readable: xxx: tracking directory mode 040755 xxx: forgetting directory mode 040755 Now in order to cut down on the verbiage, it might be an option to transmit those strings only when something happens that can't be deduced from other data. Because _if_ it can be deduced from other data (like a directory being present when files in it are), then at least the working copies are identical as long as both persons don't start deleting files from the repository. If they do so, when a directory becomes empty, the other side needs to know whether the directory is being tracked or not if it still wants to maintain the same state in the working tree. But if we really want to have not just the working tree but also the repositories in SHA1-lockstep, we can't delay transmitting this information. > We've never had that: the only time we had empty trees was when we > compared a totally empty "root" tree against another tree, and then > it was obvious. But what if the empty tree is a subdirectory of > another tree - how do you express that in a diff? Do you care? Right > now, since we always recurse into the tree (and then not find > anything), empty trees will simply not show up _at_all_ in any > diffs. One would still recurse. > And what about usability issues elsewhere? With my patch, doing something > like a > > git add directory/ > > still won't do anything, because the behaviour of "git add" has always > been to recurse into directories. This will remain the same, but the directory itself will be added if and only if the corresponding preference variable is set, regardless of whether the directory is empty. > So to add a new empty directory, you'd have to do > > git update-index --add directory > > and that's not exactly user-friendly. Presumably one could, if one really wanted an explicit way, have git add --directory directory in analogy to the --directory option of the ls command. But I think that in most cases one would not want to treat one directory different from the whole tree, so the implicit behavior regulated by a project-wide preference should be sufficient in general. > So do you add a "-n" flag to "git add" to tell it to not recurse? Or > do you always recurse, but then if you notice that the end result is > empty, you add it as a directory? I always recurse (unless there is a --directory option and I have some strange desire to actually use it). I add it as a directory, regardless of whether it is empty or not, if my preference setting (or gitignore or whatever) is set to tracking directories. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds 2007-07-18 23:40 ` Linus Torvalds 2007-07-18 23:42 ` David Kastrup @ 2007-07-21 4:29 ` David Kastrup 2007-07-21 4:51 ` Linus Torvalds [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 2 siblings, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-21 4:29 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > This really updates three different areas, which are nicely > separated into three different files, so while it's one single > patch, you can actually follow along the changes by just looking at > the differences in each file, which directly translate to separate > conceptual changes: Ok, I have now acquired enough passing familiarity with the code that I find part of my way around it. Most of your patch looks like it caters for the S_ISDIR type not previously in use in the index (how about the repository?). So that makes for quite a bit of nicer looks. The disadvantage is that it introduces a new data type and thus one has to check all the code paths to see how older versions of git will cater with newer data. My idea of a fake zero-length file would have had predictable side effects: For checking out, git would have created the directory it needed to place the "file", then try to write an empty file called "." and failing. Apart from an error message (if we aren't root on Solaris), this would have worked exactly as intended. For deletion on checking out, git would have tried deleting "." and failed. I have not checked the code to see whether git takes this as a clue not to attempt deleting the containing directory. If not, again stuff would have worked as intended. If yes, well, the user needs to clean up manually. I am not sure what code paths are executed when using S_ISDIR now in unmodified git. As a theoretical question for now: do git repositories carry some versioning inside them? Something like "don't touch me if you are not at least version x"? Anyway, the code becomes quite less of a dirty hack by using that data type, so I am pretty much taking your code (which has no overlap to the work I have done already) as is. Seems like it should play together quite nicely with my own stuff. So thanks for doing the heavy lifting in a difficult area. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 4:29 ` David Kastrup @ 2007-07-21 4:51 ` Linus Torvalds 2007-07-21 5:08 ` Linus Torvalds [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 1 sibling, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-21 4:51 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sat, 21 Jul 2007, David Kastrup wrote: > > Ok, I have now acquired enough passing familiarity with the code that > I find part of my way around it. Most of your patch looks like it > caters for the S_ISDIR type not previously in use in the index (how > about the repository?). The object database has always had S_ISDIR (well, "always" is since very early on, when I realized that flat trees didn't cut it). > The disadvantage is that it introduces a new data type and thus one > has to check all the code paths to see how older versions of git will > cater with newer data. Take a look at the "subproject" patches - those did the same (adding the ntion of a gitlink to the index), except those also changed how the tree object looked, since now a tree could contain pointers to commits too. > My idea of a fake zero-length file would have had predictable side > effects: As far as I can tell, it would have been exactly the same thing as the S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the end of the filename for being '/'. Otherwise? Exactly the same. Except for the fact that we already supported S_IFGITLINK for subprojects (and there it matches the "struct tree" entry, so it really *does* make more sense that way), so supporting S_IFDIR was actually easier. But hey, that's an implementation detail. I don't actually care all that much. In many ways, the "long-term" data structures are much more important than the index, the index is a purely temporary - and even more importantly - a purely local datastructure. The more important thing is in many ways the object storage, and that's also the reason for doing the index the way I did - it more closely matches what the object storage does (ie the "index" ends up mirroring a linearized and unpacked "tree" object). Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 4:51 ` Linus Torvalds @ 2007-07-21 5:08 ` Linus Torvalds 2007-07-21 5:28 ` David Kastrup 2007-07-28 8:44 ` David Kastrup 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-21 5:08 UTC (permalink / raw) To: David Kastrup; +Cc: git On Fri, 20 Jul 2007, Linus Torvalds wrote: > > As far as I can tell, it would have been exactly the same thing as the > S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the > end of the filename for being '/'. BTW, there is actually one big difference, and the '/' at the end actually has one huge advantage. Why? Because my preliminary patches sort the index entries wrong. A directory should always sort *as*if* it had the '/' at the end. See base_name_compare() for details. And we've never done that for the index, because the index has never had this issue (since it never contained directories). So sit down and compare base_name_compare (for tree entries) with cache_name_compare() (for index entries), and see how the latter doesn't care about the type of names. This was actually something that I hit already with subproject support, and one of my very first patches even had some (aborted) code to start sorting subprojects in the index the way we sort directories. And I *should* have done it that way, but I never did. It now makes the S_ISDIR handling harder, because directories really do have to be sorted as if they had the '/' at the end, or "git-fsck" will complain about bad sorting. Sad, sad, sad. It effectively means that S_IFGITLINK is *not* quite the same as S_IFDIR, because they sort differently. Duh. Of course, it seldom matters, but basically, you should test a directory structure that has the files dir.c dir/test in it, and the "dir" directory should always sort _after_ "dir.c". And yes, having the index entry with a '/' at the end would handle that automatically. As it is, with the "mode" difference, it instead needs to fix up "cache_name_compare()". Admittedly, that would actually be a cleanup (since it would now match base_name_compare() in logic, and could actually use that to do the name comparison!), but it's a damn painful cleanup because we don't even pass in the mode to "cache_name_compare()", since we never needed it. Gaah. cache_name_compare itself isn't used in that many places, but it's used by "index_name_pos()/cache_name_pos()", which *is* used in many places. And again, that one doesn't even have the mode, so it cannot pass it down. So it probably *is* easier to add the '/' at the end of the name instead, to make directories sort the right way in the index. I'd still suggest you *also* make the mode be S_IFDIR, though (and preferably make git-fsck actually verify that the mode and the last character of the name matches!). Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 5:08 ` Linus Torvalds @ 2007-07-21 5:28 ` David Kastrup 2007-07-21 15:53 ` Linus Torvalds 2007-07-28 8:44 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-21 5:28 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, 20 Jul 2007, Linus Torvalds wrote: >> >> As far as I can tell, it would have been exactly the same thing as the >> S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the >> end of the filename for being '/'. > > BTW, there is actually one big difference, and the '/' at the end actually > has one huge advantage. > > Why? Because my preliminary patches sort the index entries wrong. A > directory should always sort *as*if* it had the '/' at the end. Hm, that's bad. The thing is that the directory names I am tracking are called "." (that's what I was currently trying to reconcile your code with). > And I *should* have done it that way, but I never did. It now makes > the S_ISDIR handling harder, because directories really do have to > be sorted as if they had the '/' at the end, or "git-fsck" will > complain about bad sorting. Hm, I'll have to check what git-fsck does. > Of course, it seldom matters, but basically, you should test a directory > structure that has the files > > dir.c > dir/test > > in it, and the "dir" directory should always sort _after_ "dir.c". > > And yes, having the index entry with a '/' at the end would handle > that automatically. You completely lost me here. I guess I'll be able to pick this up only after investing considerable more time into the data structures. And I have to goto bed right now. > As it is, with the "mode" difference, it instead needs to fix up > "cache_name_compare()". Admittedly, that would actually be a cleanup > (since it would now match base_name_compare() in logic, and could > actually use that to do the name comparison!), but it's a damn > painful cleanup because we don't even pass in the mode to > "cache_name_compare()", since we never needed it. > > Gaah. > > cache_name_compare itself isn't used in that many places, but it's > used by "index_name_pos()/cache_name_pos()", which *is* used in many > places. And again, that one doesn't even have the mode, so it > cannot pass it down. > > So it probably *is* easier to add the '/' at the end of the name instead, > to make directories sort the right way in the index. I'd still suggest you > *also* make the mode be S_IFDIR, though (and preferably make git-fsck > actually verify that the mode and the last character of the name > matches!). The _flattened_ directory name would end in /. in my scheme. I would not want to use "xxx/" for a directory name, and "xxx" for a tree: that would be completely backwards. And I also don't like the duplication of xxx when listing objects. Sure, that's an implementation detail, but I don't like implementations hurting my eyes... -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 5:28 ` David Kastrup @ 2007-07-21 15:53 ` Linus Torvalds 2007-07-21 17:38 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-21 15:53 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sat, 21 Jul 2007, David Kastrup wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > Of course, it seldom matters, but basically, you should test a directory > > structure that has the files > > > > dir.c > > dir/test > > > > in it, and the "dir" directory should always sort _after_ "dir.c". > > > > And yes, having the index entry with a '/' at the end would handle > > that automatically. > > You completely lost me here. I guess I'll be able to pick this up > only after investing considerable more time into the data structures. So the basic issue is that not only does git obviously think that only content matters, but it describes it with a single SHA1. That's not an issue at all for a single file, but if you want to describe *multiple* files with a single SHA1 (which git obviously very much wants to do), the way you generate the SHA1 matters a lot. In particular, the order. So git is very very strict about the ordering of tree structures. A tree structure is not just a random list of <ASCII mode> + <space> + <filename> + <NUL> + <SHA1> it's very much an _ordered_ list of those things, because we want the SHA1 of the tree to be well-specified by the contents, and that means that the contents of a tree object has have absolutely _zero_ ambiguity. This means, for example, that git is very fundamentally case sensitive. There's no sane way *not* to be, because if you're case insensitive in any way at all, you'll end up having two trees that are "the same", but end up having different SHA1's. It also means that git objects have absolutely zero "localization". There is no locale at all, and there very fundamnetally *must*not* be. Again, for the same reason: if you can describe the same filename with two different encodings, you'd have two different SHA1's for the same content. So git filenames are very much a "stream of bytes", not anything else. And they need to sort 100% reliably, always the same way, and never with any localized meaning. And, partly because it seemed most natural, and partly for historical reasons, the way git sorts filenames is by sorting by *pathname*. So if you have three files named a.c a/c abc then they sort in that exact order, and no other! They sort as a "memcmp" in the full pathname, and that's really nice when you see whole collections of files, and you know the list is globally sorted. So that "global pathname sorting" has nice properties, and it seems "obvious", but it means that because git actually *encodes* those three files hierarchically as two different trees (because there's a subdirectory there), the tree objects themselves sort a bit oddly. The tree obejcts themselves will look like top-level tree: 100644 a.c -> blob1 040000 a -> tree2 100644 abc -> blob3 sub-tree: 100644 c -> blob2 and notice how the *tree* is not sorted alphabetically at all. It has a subtly different sort, where the entry "a" sorts *after* the entry "a.c", because we know that it's a tree entry, and thus will (in the *global* order) sort as if it had a "/" at the end! Traditionally, when we have the index, the index sorting has been very simple: you just sort the names as memcmp() would sort them. But note how that changes, if "a" is an empty directory. Now the index needs to sort as file a.c dir a file abc because when we create the tree entry, it needs to be sorted the same way all tree entries are always sorted - as if "a" had a slash at the end! [ Yeah, yeah, we could make a special case and just say "the empty tree sorts differently", but that actually results in huge problems when doing a "diff" between two trees: our diff machinery very much depends on the fact that the index and the trees always sort the same way, and if we sorted the "a" entry (when it is an empty directory) differently from the "a" entry (when it has entries in it), that would just be insane and cause no end of trouble for comparing two trees - one with an empty directory and one with content added to that directory. So the sorting is doubly important: it's what makes "one content" always have the same SHA1, but it is also much easier and efficient to compare directories when we know they are sorted the same way. ] In other words, introducing tree entries in the index ended up also introducing all the issues that we already had with the tree objects since they got split up hierarchically, but that the code didn't use to have to care about. The easiest way to solve this really does seem to be to add the rule that the index entry for an empty directory has to have the "/" at the end of the name - then the "sort mindlessly by name" will just continue to work. But that was what I said was broken: my patches I sent out didn't actually do that. It's *probably* just a few lines of code, and it actually would result in some nice changes ("git ls-files" would show a '/' at the end of an empty directory entry, for example), so this is not a big deal, but it's an example of how subtly different a directory is from a file when it comes to git. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 15:53 ` Linus Torvalds @ 2007-07-21 17:38 ` David Kastrup 2007-07-21 17:52 ` Simon 'corecode' Schubert ` (2 more replies) 0 siblings, 3 replies; 137+ messages in thread From: David Kastrup @ 2007-07-21 17:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 21 Jul 2007, David Kastrup wrote: > >> Linus Torvalds <torvalds@linux-foundation.org> writes: >> >> > Of course, it seldom matters, but basically, you should test a directory >> > structure that has the files >> > >> > dir.c >> > dir/test >> > >> > in it, and the "dir" directory should always sort _after_ "dir.c". >> > >> > And yes, having the index entry with a '/' at the end would handle >> > that automatically. >> >> You completely lost me here. I guess I'll be able to pick this up >> only after investing considerable more time into the data structures. [Basic explanation about git sort order and trees sorting as tree/ in order to be in the right sort order for a prefix] Ok, I could not have figured this out on my own. Are there any design documents or does one just have to pester the list? > So the basic issue is that not only does git obviously think that only > content matters, but it describes it with a single SHA1. > > That's not an issue at all for a single file, but if you want to describe > *multiple* files with a single SHA1 (which git obviously very much wants > to do), the way you generate the SHA1 matters a lot. > > In particular, the order. > > So git is very very strict about the ordering of tree structures. A tree > structure is not just a random list of > > <ASCII mode> + <space> + <filename> + <NUL> + <SHA1> Ok. > So git filenames are very much a "stream of bytes", not anything > else. And they need to sort 100% reliably, always the same way, and > never with any localized meaning. There is some utf-8/Unicode trouble to be expected in connection with that eventually: some, but not all operating and/or file systems canonicalize file names, replacing accented letters by a combining accent and the letter. But that's beside the point. > And, partly because it seemed most natural, and partly for > historical reasons, the way git sorts filenames is by sorting by > *pathname*. So if you have three files named > > a.c > a/c > abc > > then they sort in that exact order, and no other! They sort as a > "memcmp" in the full pathname, and that's really nice when you see > whole collections of files, and you know the list is globally > sorted. It is amusing that my description of git having no external concept of directories except as an expedience for representing slashes in filenames was much closer to the mark that I would have expected. > So that "global pathname sorting" has nice properties, and it seems > "obvious", but it means that because git actually *encodes* those three > files hierarchically as two different trees (because there's a > subdirectory there), the tree objects themselves sort a bit oddly. The > tree obejcts themselves will look like > > top-level tree: > 100644 a.c -> blob1 > 040000 a -> tree2 > 100644 abc -> blob3 > > sub-tree: > 100644 c -> blob2 > > and notice how the *tree* is not sorted alphabetically at all. It has a > subtly different sort, where the entry "a" sorts *after* the entry "a.c", > because we know that it's a tree entry, and thus will (in the *global* > order) sort as if it had a "/" at the end! > > Traditionally, when we have the index, the index sorting has been very > simple: you just sort the names as memcmp() would sort them. But note how > that changes, if "a" is an empty directory. Now the index needs to sort as > > file a.c > dir a > file abc > > because when we create the tree entry, it needs to be sorted the same way > all tree entries are always sorted - as if "a" had a slash at the end! Here is the layout as I would scheme it: tree1: 0?0000 . -> dir1 100644 a.c -> blob1 040000 a -> tree2 100644 abc -> blob3 sub-tree: 0?0000 . -> dir2 100644 c -> blob2 Remember that a tree evaporates when it is empty, and if we don't want to mess with that (which appears like a good idea to me), the "don't delete this" indication belongs in the subtree where its natural name is ".". Since the dir entries are _leaves_ in the tree, there is no necessity for sorting them specially. They will usually appear first, but people to all sorts of things, so filenames starting with "!" might still come before them. So the sorted flat file list for the above would be . [dir] a.c [file] a/ [tree] a/. [dir] a/c [file] abc [file] Note that a tree is basically just a string arrangement tool which gets only incidentally mapped to directories when checking out. So I am quite unhappy that 040000 is already taken by it. I can't even say, "ok, let . look like an empty tree" because there should not be something like an empty tree! I find the correlation empty->gone very important. > [ Yeah, yeah, we could make a special case and just say "the empty > tree sorts differently", but that actually results in huge problems > when doing a "diff" between two trees: our diff machinery very much > depends on the fact that the index and the trees always sort the > same way, and if we sorted the "a" entry (when it is an empty > directory) differently from the "a" entry (when it has entries in > it), that would just be insane and cause no end of trouble for > comparing two trees - one with an empty directory and one with > content added to that directory. It appears to me like our ideas are still out of sync: a directory under my scheme is _not_ at all an empty tree, rather it is an entry _inside_ of a tree, making the tree non-empty (which means that git will not be tempted to delete the corresponing real-world directory _until_ one deletes the directory entry keeping the tree alive). > So the sorting is doubly important: it's what makes "one content" > always have the same SHA1, but it is also much easier and > efficient to compare directories when we know they are sorted the > same way. ] > > It's *probably* just a few lines of code, and it actually would > result in some nice changes ("git ls-files" would show a '/' at the > end of an empty directory entry, for example), so this is not a big > deal, but it's an example of how subtly different a directory is > from a file when it comes to git. Linus, a directory is simply non-existent inside of git. Trees are an indexing mechanism solely determined by their content. That is not a subtle difference. Git _uses_ directories when exporting in order to simulate a flat namespace. But it is internally oblivious to their existence. And that is a perfectly elegant and reasonable approach and I like it very much and don't want to mess with it at all. But I also want to have directories represented within git, because not doing so leads to awkward problems. And the proper way as I see it is _not_ to mess with trees and stick them with "stay when empty" flags or similar. This messes up the whole elegance of git's flat name space. The proper way is to create a distinct object that represents a physical directory. We don't need to represent the contents of it: those are already tracked in the flat namespace fine, with trees serving as an implementation detail. All we need to represent is ".". So git-ls-files on . [dir] a.c [file] a/ [tree] a/. [dir] a/c [file] abc [file] should likely list . a.c a/. a/c abc If one wants to see the _tree_ because of its SHA1, it may also be listed. The SHA1 of a _directory_ like a/., in contrast, is uninteresting: it will be the same for every directory. Whether the _tree_ is listed as "a" or "a/" is probably a matter of taste. Personally, I think "a/" is better for bringing across the notion that it is a structuring device not really related to the physical _directory_ a which is _identical_ (meaning inode-identical, which is what counts in the physical world) to "a/." even though it is another name of it. And using "a/" puts it closer to its natural sort order. I'd write up a philosophy paper about git's relation between trees, files, directories if that were not utterly preposterous. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 17:38 ` David Kastrup @ 2007-07-21 17:52 ` Simon 'corecode' Schubert 2007-07-21 18:08 ` David Kastrup 2007-07-21 23:50 ` Linus Torvalds 2007-07-22 4:00 ` Brian Gernhardt 2 siblings, 1 reply; 137+ messages in thread From: Simon 'corecode' Schubert @ 2007-07-21 17:52 UTC (permalink / raw) To: David Kastrup; +Cc: Linus Torvalds, git David Kastrup wrote: > But I also want to have directories represented within git, because > not doing so leads to awkward problems. And the proper way as I see > it is _not_ to mess with trees and stick them with "stay when empty" > flags or similar. This messes up the whole elegance of git's flat > name space. The proper way is to create a distinct object that > represents a physical directory. We don't need to represent the > contents of it: those are already tracked in the flat namespace fine, > with trees serving as an implementation detail. > > All we need to represent is ".". What I still don't get is: How do you carry this information about "this directory should not be removed" from one checkout to the next commit? When creating a .gitignore, this file exists in the workdir. Of course you add some data to the index to stage it. But how does this work with your "." "file"? You can't put that in the filesystem. cheers simon -- Serve - BSD +++ RENT this banner advert +++ ASCII Ribbon /"\ Work - Mac +++ space for low €€€ NOW!1 +++ Campaign \ / Party Enjoy Relax | http://dragonflybsd.org Against HTML \ Dude 2c 2 the max ! http://golden-apple.biz Mail + News / \ ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 17:52 ` Simon 'corecode' Schubert @ 2007-07-21 18:08 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-21 18:08 UTC (permalink / raw) To: Simon 'corecode' Schubert; +Cc: Linus Torvalds, git Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes: > David Kastrup wrote: >> But I also want to have directories represented within git, because >> not doing so leads to awkward problems. And the proper way as I see >> it is _not_ to mess with trees and stick them with "stay when empty" >> flags or similar. This messes up the whole elegance of git's flat >> name space. The proper way is to create a distinct object that >> represents a physical directory. We don't need to represent the >> contents of it: those are already tracked in the flat namespace fine, >> with trees serving as an implementation detail. >> >> All we need to represent is ".". > > What I still don't get is: How do you carry this information about > "this directory should not be removed" from one checkout to the next > commit? I don't. The only information in the file system is whether a directory exists or not. "Should not removed" is not a property that is tracked. > When creating a .gitignore, this file exists in the workdir. Of > course you add some data to the index to stage it. But how does > this work with your "." "file"? You can't put that in the > filesystem. Either the directory is in the file system or it is not. Like with every other file. And either git tracks the directory, in which case it will notice its addition (when doing git-add) and removal (when doing git-rm or git-commit -a) or git doesn't track the directory. When git tracks the directory (a matter of gitignore settings for implicit tracking, and git-add for explicit tracking), and considers it existent, it will not touch it. If it tracks it but considers it removed in particular commit, it will attempt to remove it. Fineprint: actually, things are more involved here: git does not actually attempt to remove directories at the time it deletes them from the tree: this is sort of pointless since the sort order means that there might still be files it needs to take out from the physical directory). Instead, like before, git attempts to remove a physical directory whenever the corresponding tree in git becomes empty, and it is a prerequisite to delete a possibly tracked directory from it. After it has attempted to remove it, it will leave it alone since it is now no longer tracking it. If you add and remove a contained file, it will again try to remove the directory. If you add _both_ directory and a contained file, just removing the contained file will not make git attempt to delete the directory. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 17:38 ` David Kastrup 2007-07-21 17:52 ` Simon 'corecode' Schubert @ 2007-07-21 23:50 ` Linus Torvalds 2007-07-22 0:18 ` David Kastrup 2007-07-22 0:34 ` David Kastrup 2007-07-22 4:00 ` Brian Gernhardt 2 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-21 23:50 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sat, 21 Jul 2007, David Kastrup wrote: > > tree1: > 0?0000 . -> dir1 > 100644 a.c -> blob1 > 040000 a -> tree2 > 100644 abc -> blob3 No. Totally broken. That "." entry not only doesn't buy you anything, it is *impossible*. You cannot make an object point to itself. Not possible. Tell me how to calculate the SHA1 for the result. Also, tell me what the *point* is. There is none. > Linus, a directory is simply non-existent inside of git. You need to learn git first. A directory doesn't exist IN THE INDEX (until my patches). But you need to learn about the object database and the SHA1's. That's the real meat of git, and it sure as hell knows about directories. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 23:50 ` Linus Torvalds @ 2007-07-22 0:18 ` David Kastrup 2007-07-22 0:37 ` Linus Torvalds 2007-07-22 1:16 ` Jakub Narebski 2007-07-22 0:34 ` David Kastrup 1 sibling, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 0:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 21 Jul 2007, David Kastrup wrote: >> >> tree1: >> 0?0000 . -> dir1 >> 100644 a.c -> blob1 >> 040000 a -> tree2 >> 100644 abc -> blob3 > > No. Totally broken. That "." entry not only doesn't buy you > anything, it is *impossible*. You cannot make an object point to > itself. Not possible. It does not point to itself. The name "." points to an entry of type "dir", no content is involved. trees in the repository have content, and _only_ content. directories in the repository imply existence, and _only_ existence. > Tell me how to calculate the SHA1 for the result. Since "." has no content (as long as we don't decide to track any file permissions at one point of time), _all_ entries "." will have the same SHA1. > Also, tell me what the *point* is. There is none. The point is to have a reflection of the physical existence of a directory. Not just as a manner of accommodating slashes in a flat filespace, allowing certain slash-related operations to be carried out efficiently. >> Linus, a directory is simply non-existent inside of git. > > You need to learn git first. > > A directory doesn't exist IN THE INDEX (until my patches). But you > need to learn about the object database and the SHA1's. That's the > real meat of git, and it sure as hell knows about directories. I have written up a complete explanation about the underlying concept in a separate thread, maybe it would make sense reading that before investing too much time meddling over details that don't fit the large picture. The point is that the object database and the SHA1 values track _trees_, not _directories_. And a _tree_ is just a hashing mechanism in the repository for files. Its existence is solely dependent on the existence of its contents. The only synchronization with directories is that when a tree becomes empty, git attempts to do an rmdir on the corresponding directory. And of course, if git needs to check out a file, it creates the necessary parent directories. Now since the physical _contents_ of a directory are already tracked in _trees_ by git, the only missing part is the _existence_ of the directory itself: a directory must exist as long as there is a tree (and thus content) connected with it, but the reverse does not hold: without a tree, the directory can still exist. Which we can represent by a repository entry named "." without content (the content is already catered for by the _tree_). This must _not_ be represented by a _tree_ node since there is no content, and a tree without content by _definition_ does not exist. I must be really bad at explaining things, or I am losing a fight against preconceptions fixed beyond my imagination. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 0:18 ` David Kastrup @ 2007-07-22 0:37 ` Linus Torvalds 2007-07-22 1:05 ` David Kastrup 2007-07-22 1:16 ` Jakub Narebski 1 sibling, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 0:37 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sun, 22 Jul 2007, David Kastrup wrote: > > I must be really bad at explaining things, or I am losing a fight > against preconceptions fixed beyond my imagination. I really dont' see the point. But hey, code talks. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 0:37 ` Linus Torvalds @ 2007-07-22 1:05 ` David Kastrup 2007-07-22 1:41 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 1:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, 22 Jul 2007, David Kastrup wrote: >> >> I must be really bad at explaining things, or I am losing a fight >> against preconceptions fixed beyond my imagination. > > I really dont' see the point. But hey, code talks. Yes, I am working on that. It would have been nice if IS_DIR was not already taken by trees, but one can't have everything. So I need to decide how to represent the node, and it would appear that I need to angle for "file" after all. Since it is really quite closer to a file or symlink than to a tree or project. Hm, perhaps a symlink might be more expedient. Make it have an empty reference, and it is unique. And there will be fewer places in the code manipulating symlinks than files. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 1:05 ` David Kastrup @ 2007-07-22 1:41 ` Linus Torvalds 2007-07-22 2:39 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 1:41 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sun, 22 Jul 2007, David Kastrup wrote: > Make it have an empty reference, and it is unique. I *really* don't see the point. And you seem to have igored totally my treatise on "content" and how the stuff git tracks must be stuff that is visible and detectable in the trees. And if I understand you correctly, you also wouldn't be backwards compatible. IOW, there's a lot of "why's" at all levels. I don't see the *point*. What's the problem you're trying to solve? Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 1:41 ` Linus Torvalds @ 2007-07-22 2:39 ` David Kastrup 2007-07-22 3:43 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 2:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, 22 Jul 2007, David Kastrup wrote: >> Make it have an empty reference, and it is unique. > > I *really* don't see the point. > > And you seem to have igored totally my treatise on "content" and how > the stuff git tracks must be stuff that is visible and detectable in > the trees. Oh please. Just because you refuse to read a point-to-point reply does not mean it has not been made. "." _is_ visible and detectable in every tree. But that does not mean it is automatically tracked by git unless it gets added explicitly, or implicitly (as long as the gitignore mechanism does not kick in) by adding a higher level directory. If a file does not get added explicitly or implicitly, it does not end up in the repository and git behaves like it knows nothing about it. And that's just the way it is going to be with directories. Nothing more, nothing less, nothing new. > And if I understand you correctly, you also wouldn't be backwards > compatible. Define backwards compatible. Anyway, you are the repository wizard: here are the semantics I need supported for backwards compatibility: I need an entry type in the index and in the repository with the following features: a) if part of a tree, the tree is not considered empty. Should be easy. b) it has the name ".". This is not absolutely necessary, but it means that the gitignore mechanism can be used for dealing with it, and that's intuitive and has exactly the expressive power required for the job. Now the gitignore mechanism is isolated very locally in dir.c: whether one makes the actual representation in the repository based on an attribute like "filemode" rather than on a separate entry does not actually complicate the code all too much. There is, however, some level of complication since the consulted .gitignore file for ignoring "." must, of course, be the .gitignore file situated _in_ the directory. So making "." sit _in_ the tree rather than _on_ the tree simplifies the code considerably. It is a small amount of code, nevertheless, so it is not a major strategic decision. One conceivable implementation would be indeed similar to what the "filemode" thing does: let us keep open the option to track, at one time, permissions. The current format has, as far as I understand, all zeros in the permissions field of trees (I have not checked, though). Now if we stipulate that this is the kind of directory permissions we will in all eternity _not_ support outside of git, we are all set with regard to backwards compatibility: a tree with permissions all zero will behave as previously: it will get removed when it becomes empty (taking the corresponding work tree directory with it, if possible). And that's it. But a tree with nonzero permissions (whether they correspond to outward permissions or are just a placeholder) will _not_ evaporate when becoming empty. It will be possible to explicitly or implicitly delete it: that will just set its permissions all to zero so that it has the chance to evaporate next time it becomes empty. > IOW, there's a lot of "why's" at all levels. > > I don't see the *point*. What's the problem you're trying to solve? rm -rf ./* git-commit -m "all empty" -a unzip /tmp/something-with-empty-dirs.zip git-add . git-commit -m "something-with-empty-dirs" git-checkout HEAD~1 # Now I don't want empty directories and their parents lying around. git-checkout master # Now the state after unzip should be restored faithfully rm -rf ./* unzip /tmp/something-else-with-empty-dirs git-commit -a -m "something-else" # Now I want to have the state of something-else registered faithfully # even if it contains top-level files and directories not present in # something-with-empty-dirs, because supposedly . is being tracked, # not just every file element in it. Actually, oops. This last criterion is not met when .'s relation to the tree is such that it is only considered _part_ of tree. Looks like it might be prudent to focus on the permissions-coupled representation. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 2:39 ` David Kastrup @ 2007-07-22 3:43 ` Linus Torvalds 2007-07-22 4:28 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 3:43 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sun, 22 Jul 2007, David Kastrup wrote: > > "." _is_ visible and detectable in every tree. I'm going to add you to my "clueless" filter, because it's not worth my time to answr you any more. I told you. Several times. That "." is pointless exactly because it's in _every_ tree, and as such is no longer "content". It's not something that the user can care about, because it has no meaning. There's no point in tracking it, because even if we do *not* track it, it's there, and we cannot do anything about it. That was the whole difference between "." and ".gitignore", and I explicitly pointed out that that was the difference (and the _only_ one), and why it mattered. And you didn't listen. And now you claim that I don't read your emails. I do. They just don't make any sense. Consider this discussion ended. I simply don't care any more. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 3:43 ` Linus Torvalds @ 2007-07-22 4:28 ` David Kastrup 2007-07-22 6:38 ` david ` (3 more replies) 0 siblings, 4 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 4:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, 22 Jul 2007, David Kastrup wrote: >> >> "." _is_ visible and detectable in every tree. > > I'm going to add you to my "clueless" filter, because it's not worth > my time to answr you any more. Too bad I can't do the same. > I told you. Several times. That "." is pointless exactly because > it's in _every_ tree, and as such is no longer "content". "." is in every _non-empty_ directory tree. But we are talking about permitting _empty_ trees in the repository. And for an empty tree in the repository, "." may or may not be in the corresponding work directory tree, depending on whether the directory exists or not. So when we are talking about a repository tree _becoming_ empty, we need the information whether or whether not we should remove it upon becoming empty. _That_ is the information content of "." being or not being considered part of the trackable material. And the information is no longer available at the time the repository tree becomes empty _unless_ we already store it there when the tree is still populated. > It's not something that the user can care about, because it has no > meaning. There's no point in tracking it, because even if we do > *not* track it, it's there, and we cannot do anything about it. Ok, here we go _again_. Test case 1: mkdir a touch a/b git-add a/b git-commit -m x git-rm a/b git-commit -m x Now we want to have the directory a _removed_. Test case 2: mkdir a touch a/b git-add a git-commit -m x git-rm a/b git-commit -m x Now we want to have the directory a _retained_. After the first commit in _both_ test cases, the only file in the trees / and /a is a/b. The working directory state is _identical_ at this point, and we do identical commands afterwards. The end result is not identical, so there must be some information different in the repository after the first commit. This information _can't_ be encoded in a remaining empty tree, because both the trees / and /a are _non_-empty yet. So we _must_ encode the evaporate-or-not-when-empty information _otherwise_ into the repository. And we do that by _not_ having /a/. in the set of tracked files in test case 1, and by _having_ it in the set of tracked files in test case 2. > That was the whole difference between "." and ".gitignore", and I > explicitly pointed out that that was the difference (and the _only_ > one), and why it mattered. You are underestimating the power of ".gitignore": while it is true that its _physical_ presence will reliably keep git from removing the directory, its physical presence is not _actually_ required. It is sufficient that git _believes_ in its continuing physical existence. And if we tell it "it is still there" whenever it takes a look, then git will keep the record of .gitignore in its tree, and consequently won't remove the tree and not try deleting the directory. However, once we explicitly tell it "remove the record of .gitignore from the repository", it will do so, and in the course of doing so remove the directory in the work directory together with the tree in the repository. >From a user interface and logical standpoint, adding or not adding "." to the tracked content is a perfectly consistent and convenient way of having the directory kept around or not. >From the viewpoint of the internal data structures, I'll likely go with tampering with (pseudo-)permissions. > And you didn't listen. And now you claim that I don't read your > emails. I do. They just don't make any sense. > > Consider this discussion ended. I simply don't care any more. It is painfully clear that I could invest a few weeks of time in coding better than in explaining stuff. And I guess that's what I'll have to do. And afterwards it will be your job to wrack your head about why something does all the right things for the wrong reasons and come up with a different explanation how and why the code works. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 4:28 ` David Kastrup @ 2007-07-22 6:38 ` david 2007-07-22 9:08 ` David Kastrup 2007-07-22 17:28 ` Linus Torvalds ` (2 subsequent siblings) 3 siblings, 1 reply; 137+ messages in thread From: david @ 2007-07-22 6:38 UTC (permalink / raw) To: David Kastrup; +Cc: Linus Torvalds, git On Sun, 22 Jul 2007, David Kastrup wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> On Sun, 22 Jul 2007, David Kastrup wrote: >>> >>> "." _is_ visible and detectable in every tree. >> >> I'm going to add you to my "clueless" filter, because it's not worth >> my time to answr you any more. > > Too bad I can't do the same. > >> I told you. Several times. That "." is pointless exactly because >> it's in _every_ tree, and as such is no longer "content". > > "." is in every _non-empty_ directory tree. But we are talking about > permitting _empty_ trees in the repository. And for an empty tree in > the repository, "." may or may not be in the corresponding work > directory tree, depending on whether the directory exists or not. So > when we are talking about a repository tree _becoming_ empty, we need > the information whether or whether not we should remove it upon > becoming empty. _That_ is the information content of "." being or not > being considered part of the trackable material. And the information > is no longer available at the time the repository tree becomes empty > _unless_ we already store it there when the tree is still populated. David, the point where you and Linus are talking past each other is that Linus is assuming that you only want to track some specific directories, and for that tracking "." doesn't work becouse it's in every directory you apparently consider every directory equal and therefor the fact that "." exists in every directory doesn't bother you becouse you want to track every directory. what you are not hearing is that while Linus and the other git developers can see reasons to track directories sometimes, they definantly don't agree that you want to track directories all the time. sometimes the fact that a directory exists is significant, most of the time it's not. and the difference between what is and what isn't significant isn't a per-repository or per-project thing, it's a per-directory thing. in one repository you will have some directories that only exist becouse files are in them, and you may have some directories that exist becouse you explicitly want them to exist. both types have the "." file in them (or appear to, some OS's/filesystems don't actually have a "." on disk, they add it when needed when reporting to userspace), so git has no way to tell which ones you explicitly want tracked. creating .gitignore in the directories that you want tracked lets the other directories not be trackes. David Lang >> It's not something that the user can care about, because it has no >> meaning. There's no point in tracking it, because even if we do >> *not* track it, it's there, and we cannot do anything about it. > > Ok, here we go _again_. Test case 1: > > mkdir a > touch a/b > git-add a/b > git-commit -m x > git-rm a/b > git-commit -m x > > Now we want to have the directory a _removed_. > > Test case 2: > > mkdir a > touch a/b > git-add a > git-commit -m x > git-rm a/b > git-commit -m x > > Now we want to have the directory a _retained_. > > After the first commit in _both_ test cases, the only file in the > trees / and /a is a/b. The working directory state is _identical_ at > this point, and we do identical commands afterwards. > > The end result is not identical, so there must be some information > different in the repository after the first commit. This information > _can't_ be encoded in a remaining empty tree, because both the trees / > and /a are _non_-empty yet. > > So we _must_ encode the evaporate-or-not-when-empty information > _otherwise_ into the repository. And we do that by _not_ having > /a/. in the set of tracked files in test case 1, and by _having_ it in > the set of tracked files in test case 2. > >> That was the whole difference between "." and ".gitignore", and I >> explicitly pointed out that that was the difference (and the _only_ >> one), and why it mattered. > > You are underestimating the power of ".gitignore": while it is true > that its _physical_ presence will reliably keep git from removing the > directory, its physical presence is not _actually_ required. > > It is sufficient that git _believes_ in its continuing physical > existence. And if we tell it "it is still there" whenever it takes a > look, then git will keep the record of .gitignore in its tree, and > consequently won't remove the tree and not try deleting the directory. > However, once we explicitly tell it "remove the record of .gitignore > from the repository", it will do so, and in the course of doing so > remove the directory in the work directory together with the tree in > the repository. > > From a user interface and logical standpoint, adding or not adding "." > to the tracked content is a perfectly consistent and convenient way of > having the directory kept around or not. > > From the viewpoint of the internal data structures, I'll likely go > with tampering with (pseudo-)permissions. > >> And you didn't listen. And now you claim that I don't read your >> emails. I do. They just don't make any sense. >> >> Consider this discussion ended. I simply don't care any more. > > It is painfully clear that I could invest a few weeks of time in > coding better than in explaining stuff. And I guess that's what I'll > have to do. And afterwards it will be your job to wrack your head > about why something does all the right things for the wrong reasons > and come up with a different explanation how and why the code works. > > ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 6:38 ` david @ 2007-07-22 9:08 ` David Kastrup 2007-07-22 17:30 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 9:08 UTC (permalink / raw) To: david; +Cc: Linus Torvalds, git david@lang.hm writes: > On Sun, 22 Jul 2007, David Kastrup wrote: > >> Linus Torvalds <torvalds@linux-foundation.org> writes: >> >>> I told you. Several times. That "." is pointless exactly because >>> it's in _every_ tree, and as such is no longer "content". >> >> "." is in every _non-empty_ directory tree. But we are talking >> about permitting _empty_ trees in the repository. And for an empty >> tree in the repository, "." may or may not be in the corresponding >> work directory tree, depending on whether the directory exists or >> not. So when we are talking about a repository tree _becoming_ >> empty, we need the information whether or whether not we should >> remove it upon becoming empty. _That_ is the information content >> of "." being or not being considered part of the trackable >> material. And the information is no longer available at the time >> the repository tree becomes empty _unless_ we already store it >> there when the tree is still populated. > > David, the point where you and Linus are talking past each other is > that Linus is assuming that you only want to track some specific > directories, and for that tracking "." doesn't work becouse it's in > every directory > > you apparently consider every directory equal and therefor the fact > that "." exists in every directory doesn't bother you becouse you > want to track every directory. Sigh. No, I don't want to track every directory. I want to have every directory _trackable_. Whether it is _tracked_ depends on whether you _add_ it to the index. And that depends, among other things, on the gitignore patterns, and those can be specified on a per-directory, per-project, per-user preference. > what you are not hearing is that while Linus and the other git > developers can see reasons to track directories sometimes, they > definantly don't agree that you want to track directories all the > time. And that is why one can use per-directory, per-project and per-user settings to turn the tracking off, _and_ one can decide at what level one adds information to the index. If you always make it a habit to only ever use git-add -f and git-rm -f on _files_ and never on directories, you won't _ever_ see a difference on whether directories are tracked, and the contents of .gitignore won't make a difference, either. But if you use git-add and git-rm on directories, then for the specified directory and its children, .gitignore gets consulted. > sometimes the fact that a directory exists is significant, most of > the time it's not. and the difference between what is and what isn't > significant isn't a per-repository or per-project thing, it's a > per-directory thing. Which is why one can control it per-directory using either the .gitignore mechanism _or_ by including the directory level in question in the git-add and git-rm commands or not. > in one repository you will have some directories that only exist > becouse files are in them, and you may have some directories that > exist becouse you explicitly want them to exist. > > both types have the "." file in them (or appear to, some > OS's/filesystems don't actually have a "." on disk, they add it when > needed when reporting to userspace), so git has no way to tell which > ones you explicitly want tracked. Like with any other file, git _has_ a way to tell. If I don't git-add or git-rm the directory or one of its parents to the index, I don't want to have it tracked. And if I add the directory or one of its parents to the index recursively, but it is covered by .gitignore, I don't want to have it tracked. It is a pity that you have seemingly not read on, because there follows a simple example: >> Ok, here we go _again_. Test case 1: >> >> mkdir a >> touch a/b >> git-add a/b >> git-commit -m x >> git-rm a/b >> git-commit -m x >> >> Now we want to have the directory a _removed_. >> >> Test case 2: >> >> mkdir a >> touch a/b >> git-add a >> git-commit -m x >> git-rm a/b >> git-commit -m x >> >> Now we want to have the directory a _retained_. >> >> After the first commit in _both_ test cases, the only file in the >> trees / and /a is a/b. The working directory state is _identical_ at >> this point, and we do identical commands afterwards. >> >> The end result is not identical, so there must be some information >> different in the repository after the first commit. This information >> _can't_ be encoded in a remaining empty tree, because both the trees / >> and /a are _non_-empty yet. >> >> So we _must_ encode the evaporate-or-not-when-empty information >> _otherwise_ into the repository. And we do that by _not_ having >> /a/. in the set of tracked files in test case 1, and by _having_ it in >> the set of tracked files in test case 2. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 9:08 ` David Kastrup @ 2007-07-22 17:30 ` Linus Torvalds 2007-07-22 17:59 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 17:30 UTC (permalink / raw) To: David Kastrup; +Cc: david, git On Sun, 22 Jul 2007, David Kastrup wrote: > > Sigh. No, I don't want to track every directory. I want to have > every directory _trackable_. And they already are. Your point is pointless. You don't understand the git data structures, and you are trying to do something that makes no sense. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 17:30 ` Linus Torvalds @ 2007-07-22 17:59 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 17:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: david, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, 22 Jul 2007, David Kastrup wrote: >> >> Sigh. No, I don't want to track every directory. I want to have >> every directory _trackable_. > > And they already are. Their contents are. > Your point is pointless. You don't understand the git data > structures, and you are trying to do something that makes no sense. That makes no sense to you and apparently quite a few other people, after a lot of explaining. That does not mean that it wouldn't work, but it does mean that it is going nowhere: it is irrelevant whether I consider the concept easy to understand and explain when nobody else does: that makes it unmaintainable. Fortunately, a few other participants, notably Junio and Jakub, have focused a bit more on technical details rather than my sanity in their somewhat more nuanced feedback, and thus I have (in a separate thread) made a new proposal that addresses a few technical shortcomings and that does no longer require splitting tree-ness/directory-ness into separate concepts and records, something which I considered elegant and others gibberish. It boils down to encoding the "don't-evaporate-when-empty" or "I told you to keep track of it" property in the directory access permissions: if those are zero, git does not track the corresponding directory and will attempt a remove-on-empty. If they are non-zero (probably 755 as long as git stores only a sanitized version of the actual state there), this means that git has been told to track the directory and will not attempt to delete it until it is told to stop tracking it again. The proposal of allowing "." "!." as a gitignore pattern to specify the tracking/non-tracking indicator does still stand, but its semantics are now so much decoupled from that of "don't-evaporate-when-empty" that the code would not actually overlap with that of the tracking, and so discussing it is orthogonal to the actual proposal and can be postponed separately, and an implementation proferred separately once the rest is in place. So do both of us a favor and skip the rest of the mail queue with "Empty directories..." in its title. Actually, the code (and later comments for it) you produced matches the areas of work and what I think needs to be done quite closer now than with my original proposal. So while the discussion with you has not really been much of a help except to show without reasonable doubt that my original approach would have been unmaintainable by other persons, the code _is_ very helpful. Thanks, -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 4:28 ` David Kastrup 2007-07-22 6:38 ` david @ 2007-07-22 17:28 ` Linus Torvalds 2007-07-22 17:33 ` Linus Torvalds [not found] ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org> 3 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 17:28 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sun, 22 Jul 2007, David Kastrup wrote: > > > I told you. Several times. That "." is pointless exactly because > > it's in _every_ tree, and as such is no longer "content". > > "." is in every _non-empty_ directory tree. You're pointless. We have no problems at all with non-empty trees. We know exactly what they are. We keep track of them fine, and we do not need a totally pointless "." entry for them. > But we are talking about > permitting _empty_ trees in the repository. And WE ALREADY DO. The empty tree looks like this: "". It has a SHA1 of 4b825dc642cb6eb9a060e54bf8d69288fbee4904. It works today, and in fact, git uses it already. Try this: git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in the git repository. What do you think that is? Your "." is *pointless*. And it's _worse_ than pointless: it's not "content". It doesn't add any information. It's not something you can match up against the working tree meaningfully, exactly because *every* working tree has it. As such, it's total non-information. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 4:28 ` David Kastrup 2007-07-22 6:38 ` david 2007-07-22 17:28 ` Linus Torvalds @ 2007-07-22 17:33 ` Linus Torvalds [not found] ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org> 3 siblings, 0 replies; 137+ messages in thread From: Linus Torvalds @ 2007-07-22 17:33 UTC (permalink / raw) To: David Kastrup; +Cc: git On Sun, 22 Jul 2007, David Kastrup wrote: > > So when we are talking about a repository tree _becoming_ empty, we > need the information whether or whether not we should remove it upon > becoming empty. You don't seem to realize - although I've told you now abotu a million times - that what you are talking about is: - technically exactly the same as ".gitignore", which for some unfathomable reason you cannot seem to accept. - except your use of "." is 100% INFERIOR exactly because the "." entry has no meaning in the target filesystem, so it means that the bit of information is no longer something that is trackable in the working tree. Quite frankly, Junio would be a total idiot to take any patches that do what you want to do. Happily, he is anything but. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org> @ 2007-07-22 18:58 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 18:58 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, 22 Jul 2007, David Kastrup wrote: >> >> So when we are talking about a repository tree _becoming_ empty, we >> need the information whether or whether not we should remove it upon >> becoming empty. > > You don't seem to realize - although I've told you now abotu a million > times - that what you are talking about is: > > - technically exactly the same as ".gitignore", which for some > unfathomable reason you cannot seem to accept. Linus? Do both of us a favor and forget about the "." proposal. Since I already dropped it, we can save time if you rant about the proposal I have replaced it with and call me an idiot for a different reason. > Quite frankly, Junio would be a total idiot to take any patches that do > what you want to do. Happily, he is anything but. And he does not come across as one. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 0:18 ` David Kastrup 2007-07-22 0:37 ` Linus Torvalds @ 2007-07-22 1:16 ` Jakub Narebski 2007-07-22 1:39 ` David Kastrup 1 sibling, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2007-07-22 1:16 UTC (permalink / raw) To: git David Kastrup wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> On Sat, 21 Jul 2007, David Kastrup wrote: >>> Linus, a directory is simply non-existent inside of git. >> >> You need to learn git first. >> >> A directory doesn't exist IN THE INDEX (until my patches). But you >> need to learn about the object database and the SHA1's. That's the >> real meat of git, and it sure as hell knows about directories. > > I have written up a complete explanation about the underlying concept > in a separate thread, maybe it would make sense reading that before > investing too much time meddling over details that don't fit the large > picture. The point is that the object database and the SHA1 values > track _trees_, not _directories_. And a _tree_ is just a hashing > mechanism in the repository for files. Its existence is solely > dependent on the existence of its contents. The only synchronization > with directories is that when a tree becomes empty, git attempts to do > an rmdir on the corresponding directory. And of course, if git needs > to check out a file, it creates the necessary parent directories. > > Now since the physical _contents_ of a directory are already tracked > in _trees_ by git, the only missing part is the _existence_ of the > directory itself: a directory must exist as long as there is a tree > (and thus content) connected with it, but the reverse does not hold: > without a tree, the directory can still exist. Which we can represent > by a repository entry named "." without content (the content is > already catered for by the _tree_). This must _not_ be represented by > a _tree_ node since there is no content, and a tree without content by > _definition_ does not exist. > > I must be really bad at explaining things, or I am losing a fight > against preconceptions fixed beyond my imagination. I don't understand you, or you don't understand git. "Tree" object in object database (in repository) represents a directory in the working area. There was never any problem with having empty trees in object database, or having links to empty directory in the superdir. We don't have to change anything about object database. The problems with git problems with empty directories stems from the fact that index didn't have directories. Index is flattened version of root tree, and before subproject support it contained _only_ info about blobs (file contents). At least till Linus patch... -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 1:16 ` Jakub Narebski @ 2007-07-22 1:39 ` David Kastrup 2007-07-22 12:06 ` Jakub Narebski 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 1:39 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > David Kastrup wrote: > >> I must be really bad at explaining things, or I am losing a fight >> against preconceptions fixed beyond my imagination. > > I don't understand you, or you don't understand git. "Tree" object > in object database (in repository) represents a directory in the > working area. There was never any problem with having empty trees in > object database, or having links to empty directory in the superdir. > We don't have to change anything about object database. I disagree here. The object database _can_ represent an _empty_ directory that has been added explicitly, because up to now no operations existed that actually left an empty tree. But it can't distinguish a _non_-empty directory that has been added explicitly from non-empty directory that has not been added explicitly. To wit: after the sequence mkdir a touch a/b git-add a git-commit -m x git-rm a/b git-commit -m x I expect git to retain an empty directory a. But the _tree_ now can't be different from the tree in the situation mkdir a touch a/b git-add a/b git-commit -m x git-rm a/b git-commit -m x because after step 1, the trees have identical contents, and so there is nothing at the _identical_ step 2 that could cause different behavior. But in the second case, git must _not_ retain a. So we need to record the information that in the first case, a was added explicitly. And this can't be done with the current repository layout. It doesn't buy us anything that we _have_ a representation available for an _empty_ tree added explicitly. We need this "added explicitly" information for _every_ tree, not just empty ones. And a perfectly consistent way is to make those trees with an explicitly added directory _non-empty_, by virtue of putting a file "." in them. This file, of course, exists in every physical directory, but we may or may not decide to let it be tracked by git, using the gitignore mechanism on the pattern ".". Perfectly expedient. > The problems with git problems with empty directories stems from the > fact that index didn't have directories. That basically implies that no information about directories could be tracked in the repository. And yes, we need appropriate information in the index. Again, the information whether a directory was added explicitly. > Index is flattened version of root tree, and before subproject > support it contained _only_ info about blobs (file contents). And the repository is a versioned and hierarchically hashed version of the index, but its trees contain _no_ information that is not already inherently represented by the files alone. Permitting empty trees would change that fundamental property, and it would not buy us the ability to actually track directories: see above. So it is not worth the trouble to assign any meaningful concept to persisting empty trees rather than make them a case for git-fsck. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 1:39 ` David Kastrup @ 2007-07-22 12:06 ` Jakub Narebski 2007-07-22 13:53 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2007-07-22 12:06 UTC (permalink / raw) To: David Kastrup; +Cc: git, Linus Torvalds On Sun, 22 July 2007, David Kastrup wrote: > Jakub Narebski <jnareb@gmail.com> writes: >> David Kastrup wrote: >> >>> I must be really bad at explaining things, or I am losing a fight >>> against preconceptions fixed beyond my imagination. Or you are wrong... >> I don't understand you, or you don't understand git. "Tree" object >> in object database (in repository) represents a directory in the >> working area. There was never any problem with having empty trees in >> object database, or having links to empty directory in the superdir. >> We don't have to change anything about object database. > > I disagree here. The object database _can_ represent an _empty_ > directory that has been added explicitly, because up to now no > operations existed that actually left an empty tree. But it can't > distinguish a _non_-empty directory that has been added explicitly > from non-empty directory that has not been added explicitly. True. I forgot about that. Although I'd rather say that we want distinguish between automatically cleaned up directory (directory which will be deleted if all files in it would be deleted, and would be untracked if all tracked files in it would be deleted), and "sticky" directory, which is explicitely tracked and have to be explicitely deleted. The fact that it was added explicitely or non explicitely is orthogonal to that. IMHO it would be best to first provide plumbing infrastructure (as e.g. it was the case of submodule support), then add option to git-update-index to change the "stickiness"/"autoremoval" status of a directory (of a tree), and _last_ think about how to change the porcelain (git-add and git-rm). [...] > But in the second case, git must _not_ retain a. So we need to record > the information that in the first case, a was added explicitly. And > this can't be done with the current repository layout. It doesn't buy > us anything that we _have_ a representation available for an _empty_ > tree added explicitly. We need this "added explicitly" information > for _every_ tree, not just empty ones. > > And a perfectly consistent way is to make those trees with an > explicitly added directory _non-empty_, by virtue of putting a file > "." in them. This file, of course, exists in every physical > directory, but we may or may not decide to let it be tracked by git, > using the gitignore mechanism on the pattern ".". Perfectly > expedient. Here we disagree. I think putting "." in a tree as marker of having it not be automatically deleted when empty, as opposed to marking tree using filemode in the parent, is not a good idea. The only advantage to the "." idea is that it can use gitignore mechanism (both in-tree .gitignore, tracked or not, and info/exclude file). But I also think that the fact that gitignore mechanism is recursive is more of disadvantage than advantage. First, it is _not_ consistent. Working directory trees _always_ have '.' in them, while trees would have or would have not it, depending if they would be "sticky" or "autoremoved". Second, the "easy implementation" is anything but easy. "git add ." as a way to mark directory as "sticky" is not backward compatibile: currently it mean to add _all contents_ of current directory. Implementation is tricky: as we have seen trying to unlink '.' or create '.' can unfortunately succeed on [some Sun OS, and UFS filesystem] (which follows POSIX stupidly to the letter) f**king up the filesystem. The alternative proposal of adding "magic mode" to mark directory as "not remove when empty" is largely tested; it is very similar to the subproject support. Third, is contrary to the git philosophy of tracking contents. "Stickiness" is an attribute; the fact that directory is explicitely tracked or not does not change contents of a directory. Compare to 'blob' which contains only contents of a file: not a filename, not a pathname, not [subset of] filemode. Fourth, is very artificial. What would you put for filemode for '.'? 040000 (i.e. directory)? What would you put for sha1? Sha1 of an empty directory? Of an empty blob? 0{40} (which is bad idea because git-diff-tree uses 0{40} to represent 'not existance')? >> The problems with git problems with empty directories stems from the >> fact that index didn't have directories. > > That basically implies that no information about directories could be > tracked in the repository. And yes, we need appropriate information > in the index. Again, the information whether a directory was added > explicitly. Whether directory is automatically managed by git (automatically removed or untracked). But we need directory entry in index for git-diff, for example to recognize if there is or there is not empty directory, or if a directory is automanaged or not. >> Index is flattened version of root tree, and before subproject >> support it contained _only_ info about blobs (file contents). > > And the repository is a versioned and hierarchically hashed version of > the index, but its trees contain _no_ information that is not already > inherently represented by the files alone. [...] The above sentence is nonsensical. Index is helper for repository, and can be derived from repository. Not vice versa. Trees do contain information which is not inherently present by the blobs. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 12:06 ` Jakub Narebski @ 2007-07-22 13:53 ` David Kastrup 2007-07-22 20:26 ` Jakub Narebski 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 13:53 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub, this mail is too long already, and it does not make sense to tack a changed proposal to its end since then the readers will be exhausted at the time they come there. So I'll instead tack a followup to the "big picture" mail instead where I outline a modified approach which is presumably easier to understand and completely backwards-compatible, incorporating your feedback. There is probably little sense in wasting your time on a detailed response: feel free to point out where you don't see myself making sense. I have no problem with people coming to different conclusions that I do, but I would prefer it if it is not because they consider myself a raving lunatic, but because they have different opinions regarding the details. "I can follow you, but I disagree with your conclusion" is perfectly fine for now since I am going to propose something else, anyway. Thanks for the feedback. It gave me some good ideas. Jakub Narebski <jnareb@gmail.com> writes: > On Sun, 22 July 2007, David Kastrup wrote: >> Jakub Narebski <jnareb@gmail.com> writes: >>> David Kastrup wrote: >>> >>>> I must be really bad at explaining things, or I am losing a fight >>>> against preconceptions fixed beyond my imagination. > > Or you are wrong... Well, there is little reason for you to take my word on it, but I happen to have a history of designing and implementing systems where I have been responsible for every single byte, bootloader, firmware, applications, target compiler, assembler, whatever. I have been exposed to Unix and working with it several years before Linux even existed. I also have a track record of being not exactly stupid. So I pretty much can rule out that I am wrong on the factual side. But where I may be wrong is in estimating the how obvious the design can appear to others, and how useful and maintainable for others it may be in the long run. Linus says "code talks", but that's actually not half the story. If my code says that it works and the evidence is there, but nobody is able to understand _why_ it works, it has no place in a project where I am not permanently around. If smart people don't get what I am talking about, it does not matter that the patch is surprisingly well-contained: it will be a maintenance nightmare because people will never figure out why something stopped working after some particular change. >> I disagree here. The object database _can_ represent an _empty_ >> directory that has been added explicitly, because up to now no >> operations existed that actually left an empty tree. But it can't >> distinguish a _non_-empty directory that has been added explicitly >> from non-empty directory that has not been added explicitly. > > True. I forgot about that. Thanks. It is almost a revelation that anybody can agree on any point with me at the moment. > IMHO it would be best to first provide plumbing infrastructure (as > e.g. it was the case of submodule support), then add option to > git-update-index to change the "stickiness"/"autoremoval" status of > a directory (of a tree), and _last_ think about how to change the > porcelain (git-add and git-rm). Sure. It does no harm to think about reducing the amount of breaking porcelain, though. > [...] > >> And a perfectly consistent way is to make those trees with an >> explicitly added directory _non-empty_, by virtue of putting a file >> "." in them. This file, of course, exists in every physical >> directory, but we may or may not decide to let it be tracked by >> git, using the gitignore mechanism on the pattern ".". Perfectly >> expedient. > > Here we disagree. I think putting "." in a tree as marker of having > it not be automatically deleted when empty, as opposed to marking > tree using filemode in the parent, is not a good idea. Well, "not a good idea" is a far step forward from "stupid idiot babbling nonsense", so we may make progress towards actually being able to _weigh_ different options. I can actually associate with "not a good idea", not least because nobody else seems to get the idea, and that makes it infeasible for maintenance. So I'll address some points and then propose a different way of implementing what will in the end amount to rather similar semantics, but with a different view of looking at those semantics, one that corresponds well with the implementation. > The only advantage to the "." idea is that it can use gitignore > mechanism (both in-tree .gitignore, tracked or not, and info/exclude > file). But I also think that the fact that gitignore mechanism is > recursive is more of disadvantage than advantage. > > First, it is _not_ consistent. Working directory trees _always_ have > '.' in them, while trees would have or would have not it, depending > if they would be "sticky" or "autoremoved". Let me point out again that this inconsistency is already present in the difference of tracked and untracked _files_: they are always in the working directory, while trees have or not have them, depending on whether they are "registered" or "not". There is no inconsistency involved here, but it seems to make people _very_ uncomfortable to factor out the "stays around even if empty" functionality and call it "dir/." from the "can hold content" functionality which is in effect called "dir/", and basically associate tracked physical existence just with the former. The recursiveness of the gitignore mechanism has the advantage that when maintaining a large repository with actual or logical subprojects, one does not need to pick a single policy for all subprojects. I think that is quite important. It could possibly be achieved with some other method of having per-subproject configuration, but I see little wrong in using what is there and documented already. > Second, the "easy implementation" is anything but easy. "git add ." > as a way to mark directory as "sticky" is not backward compatibile: > currently it mean to add _all contents_ of current directory. > Implementation is tricky: as we have seen trying to unlink '.' or > create '.' can unfortunately succeed on [some Sun OS, and UFS > filesystem] (which follows POSIX stupidly to the letter) f**king up > the filesystem. I was not suggesting actually leaving any such calls in place: after all, they would presumably lead to error messages. But I agree that this could lead to nasty surprises when somebody with a legacy version of git worked with a repository containing "." as explicit entries of some file type. > The alternative proposal of adding "magic mode" to mark directory as > "not remove when empty" is largely tested; it is very similar to the > subproject support. Good. Because it is what I converged to last night. > Third, is contrary to the git philosophy of tracking contents. > "Stickiness" is an attribute; the fact that directory is explicitely > tracked or not does not change contents of a directory. Compare to > 'blob' which contains only contents of a file: not a filename, not a > pathname, not [subset of] filemode. > > Fourth, is very artificial. What would you put for filemode for '.'? > 040000 (i.e. directory)? Taken already. By something very artificial, namely a tree... Yes, this was a wart in my proposal. > What would you put for sha1? Sha1 of an empty directory? Some fixed value. Everywhere the same. Not really relevant. >> That basically implies that no information about directories could >> be tracked in the repository. And yes, we need appropriate >> information in the index. Again, the information whether a >> directory was added explicitly. > > Whether directory is automatically managed by git (automatically > removed or untracked). But we need directory entry in index for > git-diff, for example to recognize if there is or there is not empty > directory, or if a directory is automanaged or not. One conclusion that I have come to (and I think I am in agreement with Linus here) is that the information "empty or not" is actually useless separately: when I add files below a directory to the repository, the directory _can't_ be empty. And git has no way of knowing whether it is non-empty because I wanted the directory to be there, or whether it is non-empty because I could not have checked in the files into the tree below it otherwise. >> And the repository is a versioned and hierarchically hashed version >> of the index, but its trees contain _no_ information that is not >> already inherently represented by the files alone. [...] > > The above sentence is nonsensical. Index is helper for repository, > and can be derived from repository. Not vice versa. > > Trees do contain information which is not inherently present by the > blobs. Could you give examples for such information? As long as we are not talking about _history_, I am at a loss at what else you mean. File names and permissions? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 13:53 ` David Kastrup @ 2007-07-22 20:26 ` Jakub Narebski 2007-07-22 22:57 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2007-07-22 20:26 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup wrote: > "I can follow you, but I disagree with your conclusion" is perfectly > fine for now since I am going to propose something else, anyway. > > Thanks for the feedback. It gave me some good ideas. You are welcome. > Jakub Narebski <jnareb@gmail.com> writes: >> On Sun, 22 July 2007, David Kastrup wrote: >>> Jakub Narebski <jnareb@gmail.com> writes: >>>> David Kastrup wrote: >>>> >>>>> I must be really bad at explaining things, or I am losing a fight >>>>> against preconceptions fixed beyond my imagination. >> >> Or you are wrong... > > Well, there is little reason for you to take my word on it, but I > happen to have a history of designing and implementing systems where I > have been responsible for every single byte, bootloader, firmware, > applications, target compiler, assembler, whatever. I have been > exposed to Unix and working with it several years before Linux even > existed. I also have a track record of being not exactly stupid. > > So I pretty much can rule out that I am wrong on the factual side. Big words. First, there is little matter of something like area of competence. You might be systems master, but your idea about snapshot based distributed revision control systems can be wrong because DSCM are outside the area you know most about. Second, even if you are a master at given topic, you can still be wrong. Mind you, I was not saying you are wrong. I was saying you could be. [...] >> The only advantage to the "." idea is that it can use gitignore >> mechanism (both in-tree .gitignore, tracked or not, and info/exclude >> file). But I also think that the fact that gitignore mechanism is >> recursive is more of disadvantage than advantage. [...] > The recursiveness of the gitignore mechanism has the advantage that > when maintaining a large repository with actual or logical > subprojects, one does not need to pick a single policy for all > subprojects. I think that is quite important. It could possibly be > achieved with some other method of having per-subproject > configuration, but I see little wrong in using what is there and > documented already. I think it would be best implemented by repository config, e.g. core.dirManagement or something like that, which could be set to 1. "autoremove" or something like that, which gives old behavior of untracking directory if it doesn't have any tracked files in it, and removing directory if it doesn't have any files in it. 2. "noremove" or something like that, which changes the behaviour to _never_ untrack directory automatically. This can be done without any changes to 'tree' object nor index. It could be useful for git-svn repositories. 3. "marked" or something like that, for which you have to explicitely mark directories which are not to be removed when empty. 4. "recursive" or something like that, which would automatically mark as "sticky" all subdirectories added in a "sticky" repository. OR directory is not removed when empty if it is marked as such, or one of its parents is marked as such. >> Second, the "easy implementation" is anything but easy. "git add ." >> as a way to mark directory as "sticky" is not backward compatibile: >> currently it mean to add _all contents_ of current directory. >> Implementation is tricky: as we have seen trying to unlink '.' or >> create '.' can unfortunately succeed on [some Sun OS, and UFS >> filesystem] (which follows POSIX stupidly to the letter) f**king up >> the filesystem. > > I was not suggesting actually leaving any such calls in place: after > all, they would presumably lead to error messages. But I agree that > this could lead to nasty surprises when somebody with a legacy version > of git worked with a repository containing "." as explicit entries of > some file type. The "magic mode" solution _should_ work also with older git, I think. >> Fourth, is very artificial. What would you put for filemode for '.'? >> 040000 (i.e. directory)? [...] >> What would you put for sha1? Sha1 of an empty directory? > > Some fixed value. Everywhere the same. Not really relevant. Relevant because it has to work with legacy git on strange operating systems. Because git has to fsck it (and adding special casing this "some fixed value" to git-fsck is bad, bad idea). Note that sha1 cannot be sha1 of the tree. In working area '.' is self link. You cannot create self link in git repository object. [...] >>> And the repository is a versioned and hierarchically hashed version >>> of the index, but its trees contain _no_ information that is not >>> already inherently represented by the files alone. [...] [...] >> Trees do contain information which is not inherently present by the >> blobs. > > Could you give examples for such information? As long as we are not > talking about _history_, I am at a loss at what else you mean. File > names and permissions? File names and permissions. And they bind blobs and trees together. Trees do not contain any info about history. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 20:26 ` Jakub Narebski @ 2007-07-22 22:57 ` David Kastrup 2007-07-23 6:05 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-22 22:57 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > David Kastrup wrote: > >> So I pretty much can rule out that I am wrong on the factual side. > > Big words. Sure. It is not relevant, however. > First, there is little matter of something like area of competence. > You might be systems master, but your idea about snapshot based > distributed revision control systems can be wrong because DSCM are > outside the area you know most about. Slicing the concept of directory and tree into two separate things and thinking separately about them and their relation in working tree and repository is not exactly concerned with the internals. It obviously was too artificial a concept to be understandable, and likely a worse idea than necessary (whether one wants to call it too smart or too stupid for its own good may be a matter of taste). Anyway, it would be more productive if we managed to focus on the technical aspects again. I accept that my previous proposal was not fit for inclusion. > Second, even if you are a master at given topic, you can still be > wrong. > > Mind you, I was not saying you are wrong. I was saying you could be. We can leave that open since no code is going to come of the first proposal. > [...] >> The recursiveness of the gitignore mechanism has the advantage that >> when maintaining a large repository with actual or logical >> subprojects, one does not need to pick a single policy for all >> subprojects. > > I think it would be best implemented by repository config, e.g. > core.dirManagement or something like that, which could be set to > 1. "autoremove" or something like that, which gives old behavior > of untracking directory if it doesn't have any tracked files > in it, and removing directory if it doesn't have any files > in it. That's actually not _tracking_ a directory at all, but rather maintaining an independent directory in the parallel repository universe. No information specific to directories passes the index. > 2. "noremove" or something like that, which changes the behaviour > to _never_ untrack directory automatically. This can be done > without any changes to 'tree' object nor index. It could be useful > for git-svn repositories. I don't see how this could occur. Automatic _untracking_ would happen when one untracks (aka removes) a parent directory. But one would not do this while keeping the child. > 3. "marked" or something like that, for which you have to explicitely > mark directories which are not to be removed when empty. Equivalent to 1 in my scheme. > 4. "recursive" or something like that, which would automatically mark > as "sticky" all subdirectories added in a "sticky" repository. If they are covered by the add and not just implied by childs. That is, git-add a/b will not make "a" sticky while git-add a will make a/b sticky. > OR directory is not removed when empty if it is marked as such, > or one of its parents is marked as such. I'd not throw too much inheritance into the equation, or things become intractable too easily. > The "magic mode" solution _should_ work also with older git, I > think. I think so, too, for the repository. But of course what happens in the index with old code when new data types get added is a case for review, testing and praying. >>> Fourth, is very artificial. What would you put for filemode for '.'? >>> 040000 (i.e. directory)? > [...] >>> What would you put for sha1? Sha1 of an empty directory? >> >> Some fixed value. Everywhere the same. Not really relevant. > > Relevant because it has to work with legacy git on strange operating > systems. Because git has to fsck it (and adding special casing this > "some fixed value" to git-fsck is bad, bad idea). I did not mean "arbitrary value", but the value would be computed in a standard way from the node, and since the node would be the same everywhere, the hash would be too. > Note that sha1 cannot be sha1 of the tree. In working area '.' is > self link. You cannot create self link in git repository object. Certainly. And the idea was to have "." be isolated from the contents of the tree, basically treating it as a sibling of the other entries. Which is, in a way, how "." shared one namespace in Unix with what amounts to _children_ of the corresponding tree. So that was some inspiration here, probably too much so. > [...] >>>> And the repository is a versioned and hierarchically hashed version >>>> of the index, but its trees contain _no_ information that is not >>>> already inherently represented by the files alone. [...] > [...] >>> Trees do contain information which is not inherently present by the >>> blobs. >> >> Could you give examples for such information? As long as we are not >> talking about _history_, I am at a loss at what else you mean. File >> names and permissions? > > File names and permissions. And they bind blobs and trees together. Trees bind blobs and trees together? Anyway, I consider the names and permissions properties of the files and their identity. Stripping out the blobs from under them does not actually add any information: the trees still don't contain any information that would have necessitated looking at directories rather than just files, their names, permissions and content in the work space. But you are right in that the tree can't be replaced by the blobs. It actually needs the files (namely their full names and permissions) to reconstruct it. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-22 22:57 ` David Kastrup @ 2007-07-23 6:05 ` David Kastrup 2007-07-23 7:45 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-23 6:05 UTC (permalink / raw) To: Jakub Narebski; +Cc: git David Kastrup <dak@gnu.org> writes: > Jakub Narebski <jnareb@gmail.com> writes: > >> I think it would be best implemented by repository config, e.g. I got sidetracked here: the gitignore stuff has in the dirmod scheme actually no code or concept overlap with the actual scheme, so it can be considered a distraction for now and its implementation and discussion tabled. It has one disadvantage: in order to get _recursive_ behavior in one tree, one needs to use the "." pattern in the .gitignore file of the respective directory, and having a .gitignore file in that directory sort of defeats the idea of not having .gitignore directories around... Of course, a single .gitignore file is better than ones one has to distribute through the tree. >> core.dirManagement or something like that, which could be set to >> 1. "autoremove" or something like that, which gives old behavior >> of untracking directory if it doesn't have any tracked files >> in it, and removing directory if it doesn't have any files >> in it. > > That's actually not _tracking_ a directory at all, but rather > maintaining an independent directory in the parallel repository > universe. No information specific to directories passes the index. Note: that was merely a comment on semantics, not on the matter. >> 2. "noremove" or something like that, which changes the behaviour >> to _never_ untrack directory automatically. This can be done >> without any changes to 'tree' object nor index. It could be useful >> for git-svn repositories. > > I don't see how this could occur. Automatic _untracking_ would happen > when one untracks (aka removes) a parent directory. But one would not > do this while keeping the child. Correction: if there was a --directory option and one used it for git-rm (or no -r was given, so just one directory level was effected), one _could_ untrack stuff on the git side accidentally. And for something like git-svn, this might be a bad idea. So there is conceivably a market for an option that never untracks a non-empty tree. >> 3. "marked" or something like that, for which you have to explicitely >> mark directories which are not to be removed when empty. > > Equivalent to 1 in my scheme. At least if scheme 1 does not forbid some _explicit_ way of saying "track this and I really mean it". >> 4. "recursive" or something like that, which would automatically mark >> as "sticky" all subdirectories added in a "sticky" repository. > > If they are covered by the add and not just implied by childs. That is, > git-add a/b > will not make "a" sticky while > git-add a > will make a/b sticky. Addition: I was thinking so much of my implementation and its semantics that I did not consider one possibility that you might mean here: When adding a/b, always also add a (and the whole hierarchy above it) automatically as sticky. Namely disallow unsticky directories in the repository at all. That would mean that git-add a/b;git-commit -m x;git-rm a/b;git-commit -m x might not be a noop if a was not in the repository previously: it would cause a to stay around sticky until removed. With all other schemes, however, it would cause a to be removed "on behalf of the user" even if the user intended it to stay around. Indeed, this scheme might by far be the easiest to understand. Having no autoremoval at all in levels higher than the deleted level is something that people might easily understand: delayed removal just does not happen anymore, and git never deletes a directory unless told to. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-23 6:05 ` David Kastrup @ 2007-07-23 7:45 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-23 7:45 UTC (permalink / raw) To: git David Kastrup <dak@gnu.org> writes: > Addition: I was thinking so much of my implementation and its > semantics that I did not consider one possibility that you might mean > here: > > When adding a/b, always also add a (and the whole hierarchy above it) > automatically as sticky. Namely disallow unsticky directories in the > repository at all. That would mean that > > git-add a/b;git-commit -m x;git-rm a/b;git-commit -m x > > might not be a noop if a was not in the repository previously: it > would cause a to stay around sticky until removed. With all other > schemes, however, it would cause a to be removed "on behalf of the > user" even if the user intended it to stay around. > > Indeed, this scheme might by far be the easiest to understand. > Having no autoremoval at all in levels higher than the deleted level > is something that people might easily understand: delayed removal > just does not happen anymore, and git never deletes a directory > unless told to. And of course, it would be a nuisance for people managing a patch-based workflow. But those can actually easily set the repository preferences differently, and even find -type d -empty -delete is not too hard to do. So it would even be feasible as default. But I think that in practice, the "track only what has been added recursively" approach is a good default. And since patches without dir information never add anything recursively, it would mostly keep the directories clean. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 23:50 ` Linus Torvalds 2007-07-22 0:18 ` David Kastrup @ 2007-07-22 0:34 ` David Kastrup 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 0:34 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 21 Jul 2007, David Kastrup wrote: > >> Linus, a directory is simply non-existent inside of git. > > You need to learn git first. > > A directory doesn't exist IN THE INDEX (until my patches). But you > need to learn about the object database and the SHA1's. That's the > real meat of git, and it sure as hell knows about directories. To put it in another way: what would happen if trees were removed from git's repository completely? Instead we would just stipulate that git should only track files, not trees, and that it would remove an outside directory when removing the last file from the repository that can't be accomodated without such a directory. Now the effect would be that git would become quite inefficient. But it would not change its behavior in any other way. Because it knows _zilch_ about directories. It knows about the hierarchy of the _contents_, but the directories, the physical entities in the work tree? It deduces a convenient point of time to try deleting them (when a tree collapses), and it deduces that they are there as long as it is tracking their content, but no information about a _directory_ other than its _contents_ ever enter the repository or index. About its _existence_, git only keeps circumstantial evidence. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 17:38 ` David Kastrup 2007-07-21 17:52 ` Simon 'corecode' Schubert 2007-07-21 23:50 ` Linus Torvalds @ 2007-07-22 4:00 ` Brian Gernhardt 2 siblings, 0 replies; 137+ messages in thread From: Brian Gernhardt @ 2007-07-22 4:00 UTC (permalink / raw) To: David Kastrup; +Cc: Linus Torvalds, git On Jul 21, 2007, at 1:38 PM, David Kastrup wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> So git filenames are very much a "stream of bytes", not anything >> else. And they need to sort 100% reliably, always the same way, and >> never with any localized meaning. > > There is some utf-8/Unicode trouble to be expected in connection with > that eventually: some, but not all operating and/or file systems > canonicalize file names, replacing accented letters by a combining > accent and the letter. But that's beside the point. This issue exists today. OS X does a number of things to filenames, one of which is normalizing all UTF. The resulting error is wholly non-intuitive, but easy to solve. Git thinks both that the file exists under the name it expects and that the file is being ignored as the name OS X uses. The solution is to put the OS X normalized form into .git/info/exclude. Any other solution involves platform- dependent hackery and inclusion of Unicode libraries. I perused this for a short while some months ago, but was convinced to leave it be. ~~ Brian ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC PATCH] Re: Empty directories... 2007-07-21 5:08 ` Linus Torvalds 2007-07-21 5:28 ` David Kastrup @ 2007-07-28 8:44 ` David Kastrup 1 sibling, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-28 8:44 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > Why? Because my preliminary patches sort the index entries wrong. A > directory should always sort *as*if* it had the '/' at the end. > > See base_name_compare() for details. > > And we've never done that for the index, because the index has never had > this issue (since it never contained directories). So sit down and compare > base_name_compare (for tree entries) with cache_name_compare() (for index > entries), and see how the latter doesn't care about the type of names. > > This was actually something that I hit already with subproject support, > and one of my very first patches even had some (aborted) code to start > sorting subprojects in the index the way we sort directories. > > And I *should* have done it that way, but I never did. It now makes the > S_ISDIR handling harder, because directories really do have to be sorted > as if they had the '/' at the end, or "git-fsck" will complain about bad > sorting. > > Sad, sad, sad. It effectively means that S_IFGITLINK is *not* quite the > same as S_IFDIR, because they sort differently. Duh. > > Of course, it seldom matters, but basically, you should test a directory > structure that has the files > > dir.c > dir/test > > in it, and the "dir" directory should always sort _after_ "dir.c". > > And yes, having the index entry with a '/' at the end would handle that > automatically. Personally, I am not much in favor of using different names in index and repository. > As it is, with the "mode" difference, it instead needs to fix up > "cache_name_compare()". Admittedly, that would actually be a cleanup > (since it would now match base_name_compare() in logic, and could actually > use that to do the name comparison!), but it's a damn painful cleanup > because we don't even pass in the mode to "cache_name_compare()", since we > never needed it. > > Gaah. > > cache_name_compare itself isn't used in that many places, dir.c and readcache.c > but it's used by "index_name_pos()/cache_name_pos()", which *is* > used in many places. cache_name_pos: builtin-apply.c builtin-blame.c builtin-checkout-index.c builtin-ls-files.c builtin-mv.c builtin-read-tree.c builtin-rm.c builtin-update-index.c diff.c diff-lib.c dir.c merge-index.c sha1_name.c unpack-trees.c wt-status.c index_name_pos: read-cache.c > And again, that one doesn't even have the mode, so it cannot pass it > down. > So it probably *is* easier to add the '/' at the end of the name > instead, to make directories sort the right way in the index. I'd > still suggest you *also* make the mode be S_IFDIR, though (and > preferably make git-fsck actually verify that the mode and the last > character of the name matches!). Actually, pretty much all of the above files are likely to get touched by directory support one way or another anyway. One really should aim for the cleanest solution in the long run, and this for me more or less means that it makes no sense to have different names in index and repository. Putting that slash in always would probably simplify some logic in the repository as well, but I don't really like something as marker-like as "/" in the data structures. Putting a slash there would involve a three-phase plan: a) make fsck and the other code deal gracefully with either slash or no slash. Wait until everybody uses this code. b) make the code actually _put_ slashes there. Wait until everybody has used this code. c) deal with it for all eternity, oops: since rewriting the cryptographic history of existing repositories is pretty much out as far as I understand (which might be insufficient), one has to navigate around slash/noslash all the time when accessing repositories, including the sorting. The index, however, can at one point of time phase out the slash-specific sorting. There is no such thing as prehistoric indexes we would need to mind. I guess that looks like not being worth the pain. Double the code or no money back. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>]
* Re: [RFC PATCH] Re: Empty directories... [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> @ 2007-07-21 5:15 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-21 5:15 UTC (permalink / raw) To: git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 21 Jul 2007, David Kastrup wrote: >> >> Ok, I have now acquired enough passing familiarity with the code >> that I find part of my way around it. Most of your patch looks >> like it caters for the S_ISDIR type not previously in use in the >> index (how about the repository?). > > The object database has always had S_ISDIR (well, "always" is since > very early on, when I realized that flat trees didn't cut it). Then I think I have a bit of a problem: I should think that S_ISDIR in the repository presumably marks a tree object (still very fuzzy around the concepts here). An explicitly checked-in directory (under my scheme always named "." inside of its tree) would presumably also have S_ISDIR in the repository but behave quite differently. > As far as I can tell, it would have been exactly the same thing as the > S_IFDIR, just instead of the S_IFDIR check, you'd have had to check the > end of the filename for being '/'. Relative file name of ".", more or less. Both names satisfy S_IFDIR in the filesystem, though. > Otherwise? Exactly the same. > The more important thing is in many ways the object storage, and > that's also the reason for doing the index the way I did - it more > closely matches what the object storage does (ie the "index" ends up > mirroring a linearized and unpacked "tree" object). I still have to get enough of a clue about the object store to see how this pans out. I would not want to have the "." objects marked as type "tree" and empty if I can avoid it. It seems unclean, would need extra case separations all over the place, violate the "empty trees evaporate" property and also waste a good place for tracking permissions or other attributes in future. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 16:23 ` Linus Torvalds 2007-07-18 16:33 ` Linus Torvalds 2007-07-18 16:39 ` Matthieu Moy @ 2007-07-18 17:34 ` David Kastrup 2 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 17:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 18 Jul 2007, David Kastrup wrote: > >> b) The problem is not just that empty directories don't get added >> into the repository. They also don't get removed again when >> switching to a different checkout. > > Bzzt. Wrong. > > We *do* remove directories when all files under them go away. But empty directories which were empty to start with don't go away since they are not tracked. And that means that their parents don't go away. Git will remove directories which _had_ git-tracked content prior to the checkout. But it will not register empty directories created outside of git, and consequently will not remove them. > HOWEVER (and this is where one of the reasons for not tracking them > comes in): > > ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS ** > > Think about that for five seconds, then think about it some > more. Ponder it. Linus, condescension is all very nice, but I already told you: I had a directory hierarchy created outside of git's control (every file comes into being first outside of git). This hierarchy contained empty directories. The while hierarchy was committed into git. git silently skipped registering empty directories. Then a different version got checked out which did not contain the directory hierarchy in question. And git left the (unregistered) empty directories in, as well as all their parent directories. And that is just plain wrong. > So the fact is, git *already* does ass good of a job as it could > possibly do wrt directories that go away: it tries to remove them if > all the files that are tracked in it have gone away. But I told git to track the whole directory tree recursively. There were no uncommitted files it complained about. It is not reasonable that it is afterwards unable to remove this when I checkout some other tag. > A SCM *must*not* just remove that directory. It would be > horrible. The fact that it has untracked files in it does not make > those untracked files "unimportant". Sure. But that it refuses to track the files makes the total behavior an annoyance. I don't complain _how_ git handles not being able to track empty directories. I complain about it not being able to track them in the first place. The consequences are hideous. > Maybe you feel that way about object files, but what about tracking > some important parts of your home directory - does the fact that you > don't necessarily track *all* of it mean that the rest is totally > unimportant adn that git should just remove it? HELL NO! When I tell it to track it, it should not refuse. Even if it is empty. Because if it _stayed_ empty, git can then remove it (and possibly the parents) when I checkout something else. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:13 Empty directories David Kastrup 2007-07-18 0:35 ` Johannes Schindelin @ 2007-07-18 0:39 ` Matthieu Moy 2007-07-18 6:16 ` David Kastrup 2007-07-18 2:23 ` Junio C Hamano 2007-07-26 23:33 ` Robin Rosenberg 3 siblings, 1 reply; 137+ messages in thread From: Matthieu Moy @ 2007-07-18 0:39 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> writes: > or has somebody a better idea or interface or rationale? I understand > that there are use cases where one does not bother about empty > directories, but for a _content_ tracker, not tracking directories > because they are empty seems quite serious. ,----[ http://www.spinics.net/lists/git/msg30730.html ] | From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> | | I wouldn't personally mind if somebody taught git to just track empty | directories too. | | There is no fundamental git database reason not to allow them: it's in | fact quite easy to create an empty tree object. The problems with | empty directories are in the *index*, and they shouldn't be | insurmountable. | | [...] `---- -- Matthieu ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:39 ` Matthieu Moy @ 2007-07-18 6:16 ` David Kastrup 2007-07-18 6:30 ` Shawn O. Pearce 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-18 6:16 UTC (permalink / raw) To: git Matthieu Moy <Matthieu.Moy@imag.fr> writes: > David Kastrup <dak@gnu.org> writes: > >> or has somebody a better idea or interface or rationale? I understand >> that there are use cases where one does not bother about empty >> directories, but for a _content_ tracker, not tracking directories >> because they are empty seems quite serious. > > ,----[ http://www.spinics.net/lists/git/msg30730.html ] > | From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > | > | I wouldn't personally mind if somebody taught git to just track empty > | directories too. > | > | There is no fundamental git database reason not to allow them: > | it's in fact quite easy to create an empty tree object. > | The problems with empty directories are in the *index*, and they > | shouldn't be insurmountable. Stop right here: does that mean that I can script some "put empty directories into the last commit manually" procedure bypassing the index? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 6:16 ` David Kastrup @ 2007-07-18 6:30 ` Shawn O. Pearce 0 siblings, 0 replies; 137+ messages in thread From: Shawn O. Pearce @ 2007-07-18 6:30 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> wrote: > > ,----[ http://www.spinics.net/lists/git/msg30730.html ] > > | From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > | > > | I wouldn't personally mind if somebody taught git to just track empty > > | directories too. > > | > > | There is no fundamental git database reason not to allow them: > > | it's in fact quite easy to create an empty tree object. > > | The problems with empty directories are in the *index*, and they > > | shouldn't be insurmountable. > > Stop right here: does that mean that I can script some "put empty > directories into the last commit manually" procedure bypassing the > index? Yes. But when you read that tree into the index later (by say checking out a branch that points to it) the empty directories will not be created, as they have no files to cause their creation. Committing changes on that branch will remove the empty directories. ;-) Oh, and the above question from you sounds like you think you can modify the last commit to include new directories that weren't there before. You cannot do that without changing the tree SHA-1, which will cause the commit SHA-1 to change. That in turns means you are not actually adding to the last commit but instead are creating an entirely different commit. History in Git is always immutable. -- Shawn. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:13 Empty directories David Kastrup 2007-07-18 0:35 ` Johannes Schindelin 2007-07-18 0:39 ` Matthieu Moy @ 2007-07-18 2:23 ` Junio C Hamano 2007-07-18 5:56 ` David Kastrup 2007-07-26 23:33 ` Robin Rosenberg 3 siblings, 1 reply; 137+ messages in thread From: Junio C Hamano @ 2007-07-18 2:23 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> writes: > or has somebody a better idea or interface or rationale? I understand > that there are use cases where one does not bother about empty > directories, but for a _content_ tracker, not tracking directories > because they are empty seems quite serious. No objections as long as a patch is cleanly made without regression. It's just nobody agreed that it is "quite serious" yet so far, and no fundamental reason against it. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 2:23 ` Junio C Hamano @ 2007-07-18 5:56 ` David Kastrup 2007-07-18 6:34 ` Wincent Colaiuta 2007-07-18 6:53 ` Junio C Hamano 0 siblings, 2 replies; 137+ messages in thread From: David Kastrup @ 2007-07-18 5:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > David Kastrup <dak@gnu.org> writes: > >> or has somebody a better idea or interface or rationale? I understand >> that there are use cases where one does not bother about empty >> directories, but for a _content_ tracker, not tracking directories >> because they are empty seems quite serious. > > No objections as long as a patch is cleanly made without > regression. It's just nobody agreed that it is "quite serious" > yet so far, and no fundamental reason against it. Thanks. It certainly is not serious for the Linux kernel source, but seems awkward for quite a few situations. Anyway, what is your take on the situation I described? That creating some directory hierarchy (happening to contain empty directories) with some external program, adding and committing it, then switching to a different branch (or maybe doing a git-reset --hard) leaves a skeleton of empty directories around? I find this almost worse than not being able to put them into the repository: you can't get rid of them anymore either! I'd be tempted to propose that git should remove empty subdirectories when cleaning up a removed tree in the working directory, even though that violates the principle to not delete anything it isn't tracking. But since you can't get it to track the stuff in the first place... But the real fix would be to track them. Does some trick work possibly at checkin time, like putting an empty file into every empty directory, adding to the index, then removing all empty files explicitly from the index and then checking in, or is this hopeless to work around with from the user side without affecting the repository itself? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 5:56 ` David Kastrup @ 2007-07-18 6:34 ` Wincent Colaiuta 2007-07-18 6:53 ` Junio C Hamano 1 sibling, 0 replies; 137+ messages in thread From: Wincent Colaiuta @ 2007-07-18 6:34 UTC (permalink / raw) To: David Kastrup; +Cc: Junio C Hamano, git El 18/7/2007, a las 7:56, David Kastrup escribió: > That creating some directory hierarchy (happening to contain empty > directories) with some external program, adding and committing it, > then switching to a different branch (or maybe doing a git-reset > --hard) leaves a skeleton of empty directories around? > > I find this almost worse than not being able to put them into the > repository: you can't get rid of them anymore either! > > I'd be tempted to propose that git should remove empty subdirectories > when cleaning up a removed tree in the working directory, even though > that violates the principle to not delete anything it isn't tracking. > But since you can't get it to track the stuff in the first place... > > But the real fix would be to track them. Although I haven't yet been "bitten" by this issue I understand where you're coming from. This could confuse users and appear inconsistent to them (seeing as empty *files* can be tracked). I think it's probably worth tackling for that reason alone, but it will have the additional benefit of enabling other workflows like the one you describe ("installation trees for some application"). > Does some trick work possibly at checkin time, like putting an empty > file into every empty directory, adding to the index, then removing > all empty files explicitly from the index and then checking in, or is > this hopeless to work around with from the user side without affecting > the repository itself? I wouldn't recommend any "tricks" here. I think the real solution is to allow the tracking of empty trees; everything else seems like a kludge. And then, as you've noted already that will allow Git to handle the "skeleton of empty directories" left behind problem that you describe. Cheers, Wincent ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 5:56 ` David Kastrup 2007-07-18 6:34 ` Wincent Colaiuta @ 2007-07-18 6:53 ` Junio C Hamano [not found] ` <867ioyqhgc.fsf@lola.quinscape.zz> ` (2 more replies) 1 sibling, 3 replies; 137+ messages in thread From: Junio C Hamano @ 2007-07-18 6:53 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> No objections as long as a patch is cleanly made without >> regression. It's just nobody agreed that it is "quite serious" >> yet so far, and no fundamental reason against it. > > Thanks. It certainly is not serious for the Linux kernel source, but > seems awkward for quite a few situations. Anyway, what is your take > on the situation I described? Didn't I say I do not have an objection for somebody who wants to track empty directories, already? I probably would not do that myself but I do not see a reason to forbid it, either. The right approach to take probably would be to allow entries of mode 040000 in the index. Traditionally, we allowed only 100644 (blobs as regular files) and 120000 (blobs as symlinks). We recently added 160000 (commit from outer space, aka subproject). And we do that for all directories, not just empty ones. So if you have fileA, empty/, sub/fileB tracked, your index would probably have these four entries, immediately after read-tree of an existing tree object: 100644 15db6f1f27ef7a... 0 fileA 040000 4b825dc642cb6e... 0 empty 040000 e125e11d3b63e3... 0 sub 100644 52054201c2a872... 0 sub/fileB Making sure that empty/ directory exists in the working tree is probably done in entry.c; we have been touching that area in an unrelated thread in the past few days. If you add sub/fileC, with "update-index" (and "add"), you invalidate the SHA-1 object name you stored for "sub" (because there is no point recomputing the tree object until you know you need a subtree for "sub" part, which does not happen until the next "write-tree"), and end up with something like: 100644 15db6f1f27ef7a... 0 fileA 040000 4b825dc642cb6e... 0 empty 040000 00000000000000... 0 sub 100644 52054201c2a872... 0 sub/fileB 100644 705bf16c546f32... 0 sub/fileC These "missing" SHA-1 would need to be recomputed on-demand. We have had necessary infrastructure to do this "keeping untouched tree object names in the index" for quite some time, but it is not a part of the index proper (it is stored in an extension section in the index file, to keep the index compatible with older versions of git). Having made it sound so easy, here are the issues I would expect to be nontrivial (but probably not rocket surgery either). * unpack-trees, which is the workhorse for twoway merge (aka "switching branches") and threeway merge, has a convoluted logic to avoid D/F conflicts; it can probably be cleaned up once we do the above conversion so that the index starts saying "Hey, I have a directory here" more explicitly. The end result would probably be a code easier to follow. * status, update-index --refresh, and diff-files cares about the information cached in the index from the last time lstat(2) is run on each entry. What we should store there for "tree" entries is very unclear to me, but probably we should teach them to ignore the stat-matching logic for these entries. * diff-index walks the index and a tree in parallel but does not currently expect to see a tree object in the index. It needs to be taught to ignore these "tree" entries. * merge-recursive and merge-index walk the index, coming up with the merge results one path at a time. They also need to be taught to ignore these "tree" entries. * diff-index and "read-tree -m" should be taught to take advantage of the "tree" entries in the index. For example, if diff-index finds the "tree" entry in the index and the subtree found from the tree object exactly match, it does not even have to descend into the tree, which would be a huge performance win (because you do not have to open the subtree and its subtrees from the tree side; you already have read everything on the index side, and still have to skip the entries in the directory). "read-tree -m" also should be able to optimize two identical subtrees in the 2 or 3 trees involved. Even if we follow the "lazy invalidate" strategy to maintain the "tree" entries in the normal codepath, we could have a special operation that says "now update all the tree entries by recomputing the tree object names as needed". Perhaps we might want to initiate such an operation before "read-tree -m" automatically. ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <867ioyqhgc.fsf@lola.quinscape.zz>]
* Re: Empty directories... [not found] ` <867ioyqhgc.fsf@lola.quinscape.zz> @ 2007-07-18 23:34 ` Junio C Hamano 0 siblings, 0 replies; 137+ messages in thread From: Junio C Hamano @ 2007-07-18 23:34 UTC (permalink / raw) To: David Kastrup; +Cc: git David Kastrup <dak@gnu.org> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> Having made it sound so easy, here are the issues I would expect >> to be nontrivial (but probably not rocket surgery either). >> ... > This would seem to imply that the index does not need to be > upwards-compatible: simplifying the code means that old indexes won't > be treated all too well. I did not imply any such thing, by the way. These are off the top of my head technical issues and there probably are more, but I limited the list to technical side of the things. You of course have social side to take care of. If you are breaking everybody else's index, you would need to tell everybody: "I am sorry but if you upgrade your git to this version that does what I want, you have to nuke your index and start over, so commit all changes first, and then update the git. Sorry for causing you a minor inconvenience". Everybody at this point involves (obviously) the kernel folks, wine, x.org, among many others. I suspect your saying that to them is probably not good enough for them to forgive the minor inconveniences, which means you need to convince _me_ to join you in defending, in the release notes, that this is a feature worth having even though there is a minor inconvenience to redo everybody's index files. Which I suspect is quite unlikely to happen at this moment, though... A much less troublesome approach might be to do things differently from what I outlined, to keep the index compatible as long as it does not contain an empty directory, which is what we did for subprojects support. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 6:53 ` Junio C Hamano [not found] ` <867ioyqhgc.fsf@lola.quinscape.zz> @ 2007-07-20 8:29 ` Johan Herland 2007-07-20 8:41 ` David Kastrup 2007-07-22 21:35 ` David Kastrup 2 siblings, 1 reply; 137+ messages in thread From: Johan Herland @ 2007-07-20 8:29 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, David Kastrup On Wednesday 18 July 2007, Junio C Hamano wrote: > Didn't I say I do not have an objection for somebody who wants > to track empty directories, already? I probably would not do > that myself but I do not see a reason to forbid it, either. > > The right approach to take probably would be to allow entries of > mode 040000 in the index. Traditionally, we allowed only 100644 > (blobs as regular files) and 120000 (blobs as symlinks). We > recently added 160000 (commit from outer space, aka subproject). > > And we do that for all directories, not just empty ones. So if > you have fileA, empty/, sub/fileB tracked, your index would > probably have these four entries, immediately after read-tree > of an existing tree object: Sorry for jumping in late... Why do you want to add _all_ directories, and not just the ones we want to explicitly track (independent of whether they're empty or not). Basically, add a "--dir" flag to git-add, git-rm and friends, to tell them you're acting on the directory itself (rather than its (recursive) contents). "git-add --dir foo" will add the "040000 123abc... 0 foo" to the index/tree whether or not foo is an empty directory. "git-rm --dir foo" will remove that entry (or fail if it doesn't exist), but _not_ the contents of foo. Since we're making directory tracking _explicit_, this should all be trivially backward-compatible. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-20 8:29 ` Johan Herland @ 2007-07-20 8:41 ` David Kastrup 2007-07-20 10:20 ` Johan Herland 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-20 8:41 UTC (permalink / raw) To: git Johan Herland <johan@herland.net> writes: > On Wednesday 18 July 2007, Junio C Hamano wrote: >> Didn't I say I do not have an objection for somebody who wants >> to track empty directories, already? I probably would not do >> that myself but I do not see a reason to forbid it, either. >> >> The right approach to take probably would be to allow entries of >> mode 040000 in the index. Traditionally, we allowed only 100644 >> (blobs as regular files) and 120000 (blobs as symlinks). We >> recently added 160000 (commit from outer space, aka subproject). >> >> And we do that for all directories, not just empty ones. So if >> you have fileA, empty/, sub/fileB tracked, your index would >> probably have these four entries, immediately after read-tree >> of an existing tree object: > > Sorry for jumping in late... It could have given you a chance to read up on what has already been discussed. > Why do you want to add _all_ directories, and not just the ones we > want to explicitly track (independent of whether they're empty or > not). Because the problematic cases are more often than not the _implicit_ cases. Do you check a directory tree for empty directories before you archive it? In order to archive every empty directory explicitly? If you did that, you could equally maintain a script that manually does mkdir/rmdir. > Basically, add a "--dir" flag to git-add, git-rm and friends, to > tell them you're acting on the directory itself (rather than its > (recursive) contents). "git-add --dir foo" will add the "040000 > 123abc... 0 foo" to the index/tree whether or not foo is an empty > directory. "git-rm --dir foo" will remove that entry (or fail if it > doesn't exist), but _not_ the contents of foo. There is nothing wrong with implementing something like this in _addition_ to treating directory entries implicitly. For example, ls has an option -d which does just that, and even git-ls-files has an option --directory. Heck, I even have rm --help Usage: rm [OPTION]... FILE... Remove (unlink) the FILE(s). -d, --directory unlink FILE, even if it is a non-empty directory (super-user only; this works only if your system supports `unlink' for nonempty directories) [...] which works on just the directory and not on the contents. So a --directory option for appropriate commands would be natural for _explicit_ manipulation of such entries. But the important, the _really_ important thing are the implicit behaviors. If I have to hassle with every directory myself, I don't need a content tracking system. The --directory stuff, in contrast, are things nice to have when the framework is in place (and may be even necessary for some direct manual maintenance tasks), but they don't really concern the framework. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-20 8:41 ` David Kastrup @ 2007-07-20 10:20 ` Johan Herland 2007-07-20 10:54 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Johan Herland @ 2007-07-20 10:20 UTC (permalink / raw) To: git; +Cc: David Kastrup On Friday 20 July 2007, David Kastrup wrote: > Johan Herland <johan@herland.net> writes: > > Sorry for jumping in late... > > It could have given you a chance to read up on what has already been > discussed. I have tried to keep on top of the discussion so far. > > Why do you want to add _all_ directories, and not just the ones we > > want to explicitly track (independent of whether they're empty or > > not). > > Because the problematic cases are more often than not the _implicit_ > cases. Do you check a directory tree for empty directories before you > archive it? In order to archive every empty directory explicitly? No, of course I don't. But then archiving (as in tar) is intended to recreate the "working copy" exactly as it was. Git (and other SCMs), however, is only interested in recreating the part of the working copy it explicitly tracks. Given the following working copy: / /tracked/ /tracked/file /tracked/dir/ /untracked/ /untracked/file /untracked/dir/ and the following commands: $ git add tracked $ git clone The cloned result could be any of the following: (1) / /tracked/ /tracked/file This is the current behaviour; directories are not tracked at all, but only added as necessary to support files. (2) / /tracked/ /tracked/file /tracked/dir/ /untracked/ /untracked/dir/ i.e. implicitly tracking _all_ directories. This is what you literally ask for, but I think most would find this unreasonable. (3) / /tracked/ /tracked/file /tracked/dir/ i.e. recursively tracking directories (and files). This seems useful, but there is nothing _implicit_ about this. I have a feeling that you're actually arguing for doing (3) by default. What I am arguing is to do (1) by default, and (3) if given a suitable command-line option (i.e. "git add --with-dirs tracked"). Note that this is really an interface question. How these entries are actually stored in the repo is a different discussion. Finally, let's look at the case of "git add tracked/file" followed by "git rm tracked/file". I'm arguing that "tracked/" should be automatically removed, since I never asked for it to be tracked by git. On the other hand, "git-add --non-recursive tracked" followed by the above two commands, should of course leave "tracked/" in place, since I now actually asked explicitly for the directory to be tracked. My point is fundamentally that selectively tracking directories is a more powerful concept than just tracking _all_ directories by default. Note that if we support selectively tracking directories, tracking _everything_ (like you seem to want) is trivially implemented by _always_ supplying the appropriate option to git-add. If we track everything by design, we don't have the option of selectively tracking some directories. > > Basically, add a "--dir" flag to git-add, git-rm and friends, to > > tell them you're acting on the directory itself (rather than its > > (recursive) contents). "git-add --dir foo" will add the "040000 > > 123abc... 0 foo" to the index/tree whether or not foo is an empty > > directory. "git-rm --dir foo" will remove that entry (or fail if it > > doesn't exist), but _not_ the contents of foo. > > There is nothing wrong with implementing something like this in > _addition_ to treating directory entries implicitly. I don't agree. By _selectively_ tracking directories you can implement any policy you want on top of it. > For example, ls > has an option -d which does just that, and even git-ls-files has an > option --directory. Heck, I even have Yes, having commandline options for explicitly specifying directories (and not their contents) is _exactly_ what I want. > But the important, the _really_ important thing are the implicit > behaviors. If I have to hassle with every directory myself, I don't > need a content tracking system. I disagree. Just as you have to decide which files to track, you similarly should have to decide which directories to track. Of course, the tools make this easier for you by being able to recursively handle files. In the same way they should be able to do the same thing for directories. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-20 10:20 ` Johan Herland @ 2007-07-20 10:54 ` David Kastrup 2007-07-20 12:18 ` Johan Herland 0 siblings, 1 reply; 137+ messages in thread From: David Kastrup @ 2007-07-20 10:54 UTC (permalink / raw) To: git Johan Herland <johan@herland.net> writes: > On Friday 20 July 2007, David Kastrup wrote: >> Johan Herland <johan@herland.net> writes: >> > Sorry for jumping in late... >> >> It could have given you a chance to read up on what has already been >> discussed. > > I have tried to keep on top of the discussion so far. > >> > Why do you want to add _all_ directories, and not just the ones we >> > want to explicitly track (independent of whether they're empty or >> > not). >> >> Because the problematic cases are more often than not the >> _implicit_ cases. Do you check a directory tree for empty >> directories before you archive it? In order to archive every empty >> directory explicitly? > > No, of course I don't. But then archiving (as in tar) is intended to > recreate the "working copy" exactly as it was. Git (and other SCMs), > however, is only interested in recreating the part of the working > copy it explicitly tracks. Yes, and git-add some-dir tells it to track _everything_ inside some-dir. Which means that the included files are tracked _implicitly_. The included directories (including some-dir itself) are not. > Given the following working copy: > / > /tracked/ > /tracked/file > /tracked/dir/ > /untracked/ > /untracked/file > /untracked/dir/ > > and the following commands: > $ git add tracked > > $ git clone > > The cloned result could be any of the following: > > (1) > / > /tracked/ > /tracked/file > > This is the current behaviour; directories are not tracked at all, but only > added as necessary to support files. And so your case (1) actually rather is a single line: /tracked/file Everything else is just part of representing /tracked/file and disappears as soon as /tracked/file disappears. > (2) > / > /tracked/ > /tracked/file > /tracked/dir/ > /untracked/ > /untracked/dir/ > > i.e. implicitly tracking _all_ directories. This is what you literally ask > for, I don't see how you can possibly conclude that from what I have been writing. > but I think most would find this unreasonable. And it is. So please _don't_ put words into my mouth. In my proposal, the following (and nothing else) would get tracked: /tracked/. /tracked/file and that's it. That is what was requested, and that is what is tracked. There will be, incidentally, a tree "/tracked/" and a tree "/" in the _repository_, but those collapse as soon as they are empty. They are just an _abstract_ data structuring tool in the repository that is _mapped_ to directories on checkout. > / > /tracked/ > /tracked/file > /tracked/dir/ > > i.e. recursively tracking directories (and files). This seems useful, but > there is nothing _implicit_ about this. You did not ask for "/tracked/file" and you did not ask for "/tracked/dir/" (whatever they may be). That you wanted to track them was _implied_ by your request of "/tracked/". > I have a feeling that you're actually arguing for doing (3) by > default. What I am arguing is to do (1) by default, and (3) if > given a suitable command-line option (i.e. "git add --with-dirs > tracked"). > > Note that this is really an interface question. Not at all. It is a _conceptual_ question: in order for this to work at _all_ (instead of being an inconsistent heap of ugly surprises), directories need a representation in the repo. This representation, as opposed to in the work file system, is _optional_: the repository got perfectly well along without it up to now, and the fallback is already implemented when there is a tree without corresponding directory. > How these entries are actually stored in the repo is a different > discussion. Sure. But anything that requires four dozens of special cases instead of four because one wanted to keep "things that are under some specialized view separate separate" is not something I am going to implement. I am too old to juggle with complexity for the sake of complexity. I can make much more use of the existing infrastructure by actually making file and directory entries quite similar. ls -la also has no special cases for "." and ".." because they are, at a very fundamental level, very special in achieving a special purpose _without_ being special-cased. > Finally, let's look at the case of "git add tracked/file" followed > by "git rm tracked/file". I'm arguing that "tracked/" should be > automatically removed, since I never asked for it to be tracked by > git. Sure. And nobody ever said otherwise. In fact, I gave about a dozen examples in that line and more special in the thread up to now. > On the other hand, "git-add --non-recursive tracked" followed by the > above two commands, should of course leave "tracked/" in place, > since I now actually asked explicitly for the directory to be > tracked. Sure. Use "--directory" instead of "--non-recursive" and you have a somewhat more special option for that. > My point is fundamentally that selectively tracking directories is a > more powerful concept than just tracking _all_ directories by > default. Perhaps you might read up on some of the past discussion before beating dead horses. This has been covered already, and more than once. I never asked for "all directories" to be tracked. I outlined cases where they are tracked and where not, and I tested that the mechanisms in "man gitignore" already work _perfectly_ with the pattern "." for configuring the _implied_ tracking at directory, repository, project, and user preference level. > Note that if we support selectively tracking directories, tracking > _everything_ (like you seem to want) is trivially implemented by > _always_ supplying the appropriate option to git-add. If we track > everything by design, we don't have the option of selectively > tracking some directories. But that means manual intervention all of the time. It is fine when a tool provides an option to shoot you in the arm instead of in the foot as usual, but that's not really a fix, but an acerbation of the problem. >> > Basically, add a "--dir" flag to git-add, git-rm and friends, to >> > tell them you're acting on the directory itself (rather than its >> > (recursive) contents). "git-add --dir foo" will add the "040000 >> > 123abc... 0 foo" to the index/tree whether or not foo is an empty >> > directory. "git-rm --dir foo" will remove that entry (or fail if >> > it doesn't exist), but _not_ the contents of foo. >> >> There is nothing wrong with implementing something like this in >> _addition_ to treating directory entries implicitly. > > I don't agree. By _selectively_ tracking directories you can > implement any policy you want on top of it. No, you can't. Because a "policy" means that things are _implied_. Being able to do everything manually is not a policy. It may be a lifesaver at times, but then you have little business drifting in the river in the first place. >> But the important, the _really_ important thing are the implicit >> behaviors. If I have to hassle with every directory myself, I >> don't need a content tracking system. > > I disagree. Just as you have to decide which files to track, you >similarly should have to decide which directories to track. Of >course, the tools make this easier for you by being able to >recursively handle files. In the same way they should be able to do >the same thing for directories. --directory _explicitly_ is not working recursively, so it does not solve that problem. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-20 10:54 ` David Kastrup @ 2007-07-20 12:18 ` Johan Herland [not found] ` <86odi7utdj.fsf@lola.quinscape.zz> 0 siblings, 1 reply; 137+ messages in thread From: Johan Herland @ 2007-07-20 12:18 UTC (permalink / raw) To: David Kastrup; +Cc: git On Friday 20 July 2007, David Kastrup wrote: > Johan Herland <johan@herland.net> writes: > > My point is fundamentally that selectively tracking directories is a > > more powerful concept than just tracking _all_ directories by > > default. > > Perhaps you might read up on some of the past discussion before > beating dead horses. This has been covered already, and more than > once. I never asked for "all directories" to be tracked. I outlined > cases where they are tracked and where not, and I tested that the > mechanisms in "man gitignore" already work _perfectly_ with the > pattern "." for configuring the _implied_ tracking at directory, > repository, project, and user preference level. It seems our discussion is based on so many misunderstandings of each other that it's not very useful to reply to specific parts of it. AFAICS, from a high-level POV, we're pretty much in agreement on the following points: 1. Git should be able to track directories. 2. Tracked directories should be kept alive, even if empty. 3. Git must not necessarily track _all_ directories. Conversely, we seem to disagree on these points: 4. Whether or not git should track directories by default. You say yes, I say no. 5. How the tracking of directories should be implemented in git's object database. I want to keep the index/tree as-is except for adding directory entries (w/mode 040000) for the tracked directories only. You seem to want to add directory entries for _all_ directories and then additional "." entries for directories you don't want deleted if/when empty. Am I making sense, or have I misunderstood our misunderstandings? ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 137+ messages in thread
[parent not found: <86odi7utdj.fsf@lola.quinscape.zz>]
* Re: Empty directories... [not found] ` <86odi7utdj.fsf@lola.quinscape.zz> @ 2007-07-20 13:20 ` Johan Herland 2007-07-20 13:33 ` David Kastrup 0 siblings, 1 reply; 137+ messages in thread From: Johan Herland @ 2007-07-20 13:20 UTC (permalink / raw) To: David Kastrup; +Cc: git On Friday 20 July 2007, David Kastrup wrote: > Johan Herland <johan@herland.net> writes: > > > AFAICS, from a high-level POV, we're pretty much in agreement on the > > following points: > > > > 1. Git should be able to track directories. > > > > 2. Tracked directories should be kept alive, even if empty. > > > > 3. Git must not necessarily track _all_ directories. > > > > > > Conversely, we seem to disagree on these points: > > > > 4. Whether or not git should track directories by default. You say > > yes, I say no. > > Element of least surprise. But since my proposal allows easy and > intuitive declaration of the preference at user, project, and > directory level without one choice messing with the choice of other > projects and contributors with mixed preferences, this is quite > unimportant. > > We are in agreement that adding or removing the tracking explicitly > for a single directory might be useful to have. But it can't be the > only way. As long as you can add/remove tracking recursively for a whole (sub)tree, I don't see what's the problem. Of course, if you want to change the default behaviour, you should be able either set a config variable somewhere, or - as a last resort - alias git-add and git-rm to always supply the appropriate command-line option. > > 5. How the tracking of directories should be implemented in git's > > object database. I want to keep the index/tree as-is except for > > adding directory entries (w/mode 040000) for the tracked directories > > only. You seem to want to add directory entries for _all_ > > directories and then additional "." entries for directories you > > don't want deleted if/when empty. > > No. I don't want to change _anything_ for untracked directories. > They are, as previously, implied by the contents and have a "tree" > entry for efficiency reasons. Nothing new here. > > The directory mode entries are named "." and are for tracked > directories only. Ok. So our difference in opinion on implementation is even smaller than I imagined; basically only whether the directory is tracked by a mode "040000" entry, or by a "." entry. > > Am I making sense, or have I misunderstood our misunderstandings? > > The latter. You are violently arguing for what I outlined. Which > probably shows that I am not the best at explaining my ideas, and that > it reflects badly upon them. That probably goes for both of us :) Well, as long as we have this clarified, I don't see much point in continuing this part of the thread. I feel confident that the git community as a whole will converge on the best technical solution, once it surfaces. Have fun! ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-20 13:20 ` Johan Herland @ 2007-07-20 13:33 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-20 13:33 UTC (permalink / raw) To: git Johan Herland <johan@herland.net> writes: > On Friday 20 July 2007, David Kastrup wrote: >> Johan Herland <johan@herland.net> writes: > >> > 4. Whether or not git should track directories by default. You >> > say yes, I say no. >> >> Element of least surprise. But since my proposal allows easy and >> intuitive declaration of the preference at user, project, and >> directory level without one choice messing with the choice of other >> projects and contributors with mixed preferences, this is quite >> unimportant. >> >> We are in agreement that adding or removing the tracking explicitly >> for a single directory might be useful to have. But it can't be >> the only way. > > As long as you can add/remove tracking recursively for a whole > (sub)tree, I don't see what's the problem. Neither do I. But a --directory option never is recursive. That is the whole point. Probably we are in violent agreement again. > Of course, if you want to change the default behaviour, you should > be able either set a config variable somewhere, or - as a last > resort - alias git-add and git-rm to always supply the appropriate > command-line option. Or declare diverging behaviors using a !. or . entry in the gitignore mechanisms. Which work everywhere where we need them. >> > 5. How the tracking of directories should be implemented in git's >> > object database. I want to keep the index/tree as-is except for >> > adding directory entries (w/mode 040000) for the tracked >> > directories only. You seem to want to add directory entries for >> > _all_ directories and then additional "." entries for directories >> > you don't want deleted if/when empty. >> >> No. I don't want to change _anything_ for untracked directories. >> They are, as previously, implied by the contents and have a "tree" >> entry for efficiency reasons. Nothing new here. >> >> The directory mode entries are named "." and are for tracked >> directories only. > > Ok. So our difference in opinion on implementation is even smaller > than I imagined; basically only whether the directory is tracked by > a mode "040000" entry, or by a "." entry. Actually, even smaller: I'd track them by a "." entry with mode 1777755755 or whatever is the natural expression for "this is a directory". The mode would be different from the existing "this is a tree". _If_ one wants at one time track permissions of files apart from "x", the "." entry would be natural for carrying directory permissions. Without ".", you basically tell git "I don't care about the existence of this directory. Just do what is necessary for checking out my files". >> > Am I making sense, or have I misunderstood our misunderstandings? >> >> The latter. You are violently arguing for what I outlined. Which >> probably shows that I am not the best at explaining my ideas, and >> that it reflects badly upon them. > > That probably goes for both of us :) > > Well, as long as we have this clarified, I don't see much point in > continuing this part of the thread. I feel confident that the git > community as a whole will converge on the best technical solution, > once it surfaces. I'll probably crank out some insolently primitive proof of concept eventually. -- David Kastrup ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 6:53 ` Junio C Hamano [not found] ` <867ioyqhgc.fsf@lola.quinscape.zz> 2007-07-20 8:29 ` Johan Herland @ 2007-07-22 21:35 ` David Kastrup 2 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-22 21:35 UTC (permalink / raw) To: git Coming full circle... Junio C Hamano <gitster@pobox.com> writes: > The right approach to take probably would be to allow entries of > mode 040000 in the index. Traditionally, we allowed only 100644 > (blobs as regular files) and 120000 (blobs as symlinks). We > recently added 160000 (commit from outer space, aka subproject). > > And we do that for all directories, not just empty ones. So if > you have fileA, empty/, sub/fileB tracked, your index would > probably have these four entries, immediately after read-tree > of an existing tree object: > > 100644 15db6f1f27ef7a... 0 fileA > 040000 4b825dc642cb6e... 0 empty > 040000 e125e11d3b63e3... 0 sub > 100644 52054201c2a872... 0 sub/fileB This would be very much what I am proposing now, except that instead of 040000 we would have 040755 usually, so that when the index makes it into the repository where 040000 already has a meaning (a disappear-when-empty tree) we get the right information. Also note that the above comes about when doing git-add * but not when doing git-add fileA empty sub/fileB (in the latter case, the entry for sub would be missing) > If you add sub/fileC, with "update-index" (and "add"), you > invalidate the SHA-1 object name you stored for "sub" (because > there is no point recomputing the tree object until you know you > need a subtree for "sub" part, which does not happen until the > next "write-tree"), and end up with something like: > > 100644 15db6f1f27ef7a... 0 fileA > 040000 4b825dc642cb6e... 0 empty > 040000 00000000000000... 0 sub > 100644 52054201c2a872... 0 sub/fileB > 100644 705bf16c546f32... 0 sub/fileC > > These "missing" SHA-1 would need to be recomputed on-demand. Ah, ok. Does it even make sense to compute the SHA-1 values in the index in advance? What would they be useful for? > We have had necessary infrastructure to do this "keeping > untouched tree object names in the index" for quite some time, > but it is not a part of the index proper (it is stored in an > extension section in the index file, to keep the index > compatible with older versions of git). What is the application for which this is being used? > Having made it sound so easy, here are the issues I would expect > to be nontrivial (but probably not rocket surgery either). > > * unpack-trees, which is the workhorse for twoway merge (aka > "switching branches") and threeway merge, has a convoluted > logic to avoid D/F conflicts; it can probably be cleaned up > once we do the above conversion so that the index starts > saying "Hey, I have a directory here" more explicitly. The > end result would probably be a code easier to follow. I am afraid that this is unlikely to happen, and that is because directory tracking remains optional at a fundamental level as long as we want to support the current behavior as an option. However, one could conceivably add 040000 entries (rather than 040755) for directories that have not been passed into tracking but are required by git, if this simplifies matters. But it sounds like something that might complicate working with several different git versions on the same index. > * status, update-index --refresh, and diff-files cares about > the information cached in the index from the last time > lstat(2) is run on each entry. What we should store there > for "tree" entries is very unclear to me, but probably we > should teach them to ignore the stat-matching logic for > these entries. At the current point of time, git tracks just the u+x bit for normal files, and for directories, there is really nothing worth tracking as long as no attempt of restoring more mode bits is done. Modification times are probably a bit too risky to pay attention to. > * diff-index walks the index and a tree in parallel but does > not currently expect to see a tree object in the index. It > needs to be taught to ignore these "tree" entries. Or do something sensible when comparing. Understood. > * merge-recursive and merge-index walk the index, coming up > with the merge results one path at a time. They also need to > be taught to ignore these "tree" entries. Same here. > * diff-index and "read-tree -m" should be taught to take > advantage of the "tree" entries in the index. For example, > if diff-index finds the "tree" entry in the index and the > subtree found from the tree object exactly match, it does not > even have to descend into the tree, which would be a huge > performance win (because you do not have to open the subtree > and its subtrees from the tree side; you already have read > everything on the index side, and still have to skip the > entries in the directory). "read-tree -m" also should be > able to optimize two identical subtrees in the 2 or 3 trees > involved. > > Even if we follow the "lazy invalidate" strategy to maintain > the "tree" entries in the normal codepath, we could have a > special operation that says "now update all the tree entries > by recomputing the tree object names as needed". Perhaps we > might want to initiate such an operation before "read-tree > -m" automatically. Over my head, but it would appear that it can safely left for later. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-18 0:13 Empty directories David Kastrup ` (2 preceding siblings ...) 2007-07-18 2:23 ` Junio C Hamano @ 2007-07-26 23:33 ` Robin Rosenberg 2007-07-27 5:22 ` David Kastrup 3 siblings, 1 reply; 137+ messages in thread From: Robin Rosenberg @ 2007-07-26 23:33 UTC (permalink / raw) To: David Kastrup; +Cc: git ( I don't know which mail is the best to reply to and I probably missed something in the thread, so bear with me if I'm repeating anything. ) David. Reconsider "tracking" all directories and what that would give, compared to explicitly tracking specific ones and the requires magic entries. Say we have a config setting that tells git never to remove empty trees. Linus patches could be a start for representing trees in the index. As an optimization the index could prune trees from the index if they contain things as long as the index *effectively* remembers all trees. Using the patches again we could add empty directories to the index and remove them. No directory would be removed automatically, except maybe by a merge. We would probably have only a few empty directories and new unexpected ones would only pop up when we remove all blobs from one. Git status could tell us about them so we will not forget them. It could even tell us about "new" empty directories, which is probably the most important thing you'd want to know. Forgetting to untrack an empty directory would not be a big deal. Whether to retain empty trees or not should be a repository policy, but an all or nothing setting. -- robin ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: Empty directories... 2007-07-26 23:33 ` Robin Rosenberg @ 2007-07-27 5:22 ` David Kastrup 0 siblings, 0 replies; 137+ messages in thread From: David Kastrup @ 2007-07-27 5:22 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > ( > I don't know which mail is the best to reply to and I probably missed > something in the thread, so bear with me if I'm repeating anything. > ) > > David. Reconsider "tracking" all directories and what that would > give, compared to explicitly tracking specific ones and the requires > magic entries. It would be quite a nuisance for a patch-based workflow, since patches don't talk about the creation and deletion of directories. The "track only when entered approach" has the advantage that directories that were only created to accommodate patches will be removed again when becoming empty. Of course, once doing "git-add top-level" will level the difference. > Say we have a config setting that tells git never to remove empty > trees. Why wouldn't I have tree/zap removed when doing git-rm tree? > Linus patches could be a start for representing trees in the > index. As an optimization the index could prune trees from the index > if they contain things as long as the index *effectively* remembers > all trees. But it doesn't. If you do git-add tree, optimizing the dir entry away since tree/zap exists, then subsequently do git-rm tree/zap, of course there is nothing to do except remove tree/zap, and the tree is gone. One can't start tracking trees explicitly only when they become empty, because one can't know whether to track them then. > Using the patches again we could add empty directories to the index > and remove them. No directory would be removed automatically, except > maybe by a merge. I currently have the problem that rm -rf * unzip some-archive git-add some-archive git-commit -a -m whatever git-checkout something else leaves empty directory skeletons lying around. > We would probably have only a few empty directories and new > unexpected ones would only pop up when we remove all blobs from > one. Git status could tell us about them so we will not forget > them. I don't want a source management system to tell me whenever it is going to annoy me. > It could even tell us about "new" empty directories, which is > probably the most important thing you'd want to know. > > Forgetting to untrack an empty directory would not be a big deal. > > Whether to retain empty trees or not should be a repository policy, > but an all or nothing setting. With that approach idea the workflow "Apply a patch creating something/hello" "Undo the patch creating something/hello" will leave something lying around. For somebody managing hundreds of directories, that would be a nuisance. I don't say that a "track all parents automatically" approach would not have its merits: it would likely prevent some mistakes and be easily understandable to most users. But for managing a patch workflow, it would appear to get in the way. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 137+ messages in thread
end of thread, other threads:[~2007-07-28 8:45 UTC | newest] Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-07-18 0:13 Empty directories David Kastrup 2007-07-18 0:35 ` Johannes Schindelin 2007-07-18 6:07 ` David Kastrup 2007-07-18 10:26 ` Johannes Schindelin [not found] ` <86tzs2m1h7.fsf@lola.quinscape.zz> 2007-07-18 11:24 ` Johannes Schindelin 2007-07-18 11:40 ` Matthieu Moy 2007-07-18 12:12 ` David Kastrup 2007-07-18 16:23 ` Linus Torvalds 2007-07-18 16:33 ` Linus Torvalds 2007-07-18 17:38 ` David Kastrup 2007-07-18 18:05 ` Linus Torvalds 2007-07-18 16:39 ` Matthieu Moy 2007-07-18 17:06 ` Linus Torvalds 2007-07-18 21:37 ` David Kastrup 2007-07-18 21:45 ` Linus Torvalds 2007-07-18 23:13 ` David Kastrup 2007-07-18 23:16 ` [RFC PATCH] " Linus Torvalds 2007-07-18 23:40 ` Linus Torvalds 2007-07-18 23:42 ` David Kastrup 2007-07-19 0:22 ` Linus Torvalds 2007-07-19 5:28 ` Junio C Hamano 2007-07-19 5:38 ` Shawn O. Pearce 2007-07-19 6:08 ` David Kastrup 2007-07-19 7:10 ` Geoff Russell 2007-07-19 6:09 ` Shawn O. Pearce 2007-07-19 8:13 ` Matthieu Moy 2007-07-19 10:51 ` Tomash Brechko 2007-07-19 11:31 ` David Kastrup 2007-07-19 12:32 ` Tomash Brechko 2007-07-19 12:46 ` David Kastrup 2007-07-23 20:18 ` Nix 2007-07-23 20:49 ` David Kastrup 2007-07-23 21:49 ` Nix 2007-07-23 22:05 ` Nix 2007-07-23 22:52 ` Jakub Narebski 2007-07-25 22:43 ` Nix 2007-07-23 22:16 ` David Kastrup 2007-07-23 22:31 ` Linus Torvalds 2007-07-23 23:32 ` Nix 2007-07-23 23:57 ` Linus Torvalds [not found] ` <86ps2ithyl.fsf@lola.quinscape.zz> 2007-07-24 6:56 ` Nix 2007-07-19 12:38 ` David Kastrup 2007-07-19 13:21 ` David Kastrup 2007-07-19 12:16 ` Johannes Schindelin 2007-07-19 12:24 ` David Kastrup 2007-07-19 14:44 ` Brian Gernhardt 2007-07-19 15:43 ` Johannes Schindelin 2007-07-19 16:06 ` Brian Gernhardt 2007-07-19 16:17 ` Johannes Schindelin 2007-07-19 16:28 ` David Kastrup 2007-07-19 16:34 ` Brian Gernhardt 2007-07-19 17:30 ` Johannes Schindelin [not found] ` <Pine.LNX.4.64.070719 1829530.14781@racer.site> 2007-07-19 17:47 ` David Kastrup 2007-07-19 16:17 ` Matthieu Moy 2007-07-19 16:21 ` David Kastrup [not found] ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com> [not found] ` <863azk78yp.fsf@lola.quinscape.zz> 2007-07-19 15:08 ` Brian Gernhardt 2007-07-19 15:27 ` David Kastrup 2007-07-19 15:50 ` Brian Gernhardt 2007-07-20 0:01 ` Junio C Hamano 2007-07-20 0:15 ` Linus Torvalds 2007-07-20 0:33 ` Linus Torvalds 2007-07-20 2:24 ` Junio C Hamano 2007-07-20 2:31 ` Linus Torvalds 2007-07-20 5:55 ` David Kastrup 2007-07-20 5:58 ` David Kastrup 2007-07-20 15:31 ` Linus Torvalds 2007-07-20 5:35 ` David Kastrup 2007-07-20 9:27 ` Simon 'corecode' Schubert 2007-07-20 10:11 ` David Kastrup 2007-07-20 10:34 ` Junio C Hamano 2007-07-20 13:23 ` David Kastrup 2007-07-20 19:24 ` Linus Torvalds 2007-07-20 21:02 ` Johan Herland 2007-07-20 21:48 ` Linus Torvalds 2007-07-20 22:36 ` Julian Phillips 2007-07-21 0:18 ` Linus Torvalds 2007-07-21 1:23 ` David Kastrup 2007-07-21 3:54 ` David Kastrup [not found] ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net> 2007-07-20 5:53 ` David Kastrup 2007-07-20 10:19 ` Olivier Galibert 2007-07-19 5:59 ` David Kastrup 2007-07-19 9:54 ` David Kastrup [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 2007-07-22 21:08 ` David Kastrup 2007-07-21 4:29 ` David Kastrup 2007-07-21 4:51 ` Linus Torvalds 2007-07-21 5:08 ` Linus Torvalds 2007-07-21 5:28 ` David Kastrup 2007-07-21 15:53 ` Linus Torvalds 2007-07-21 17:38 ` David Kastrup 2007-07-21 17:52 ` Simon 'corecode' Schubert 2007-07-21 18:08 ` David Kastrup 2007-07-21 23:50 ` Linus Torvalds 2007-07-22 0:18 ` David Kastrup 2007-07-22 0:37 ` Linus Torvalds 2007-07-22 1:05 ` David Kastrup 2007-07-22 1:41 ` Linus Torvalds 2007-07-22 2:39 ` David Kastrup 2007-07-22 3:43 ` Linus Torvalds 2007-07-22 4:28 ` David Kastrup 2007-07-22 6:38 ` david 2007-07-22 9:08 ` David Kastrup 2007-07-22 17:30 ` Linus Torvalds 2007-07-22 17:59 ` David Kastrup 2007-07-22 17:28 ` Linus Torvalds 2007-07-22 17:33 ` Linus Torvalds [not found] ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org> 2007-07-22 18:58 ` David Kastrup 2007-07-22 1:16 ` Jakub Narebski 2007-07-22 1:39 ` David Kastrup 2007-07-22 12:06 ` Jakub Narebski 2007-07-22 13:53 ` David Kastrup 2007-07-22 20:26 ` Jakub Narebski 2007-07-22 22:57 ` David Kastrup 2007-07-23 6:05 ` David Kastrup 2007-07-23 7:45 ` David Kastrup 2007-07-22 0:34 ` David Kastrup 2007-07-22 4:00 ` Brian Gernhardt 2007-07-28 8:44 ` David Kastrup [not found] ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?> 2007-07-21 5:15 ` David Kastrup 2007-07-18 17:34 ` David Kastrup 2007-07-18 0:39 ` Matthieu Moy 2007-07-18 6:16 ` David Kastrup 2007-07-18 6:30 ` Shawn O. Pearce 2007-07-18 2:23 ` Junio C Hamano 2007-07-18 5:56 ` David Kastrup 2007-07-18 6:34 ` Wincent Colaiuta 2007-07-18 6:53 ` Junio C Hamano [not found] ` <867ioyqhgc.fsf@lola.quinscape.zz> 2007-07-18 23:34 ` Junio C Hamano 2007-07-20 8:29 ` Johan Herland 2007-07-20 8:41 ` David Kastrup 2007-07-20 10:20 ` Johan Herland 2007-07-20 10:54 ` David Kastrup 2007-07-20 12:18 ` Johan Herland [not found] ` <86odi7utdj.fsf@lola.quinscape.zz> 2007-07-20 13:20 ` Johan Herland 2007-07-20 13:33 ` David Kastrup 2007-07-22 21:35 ` David Kastrup 2007-07-26 23:33 ` Robin Rosenberg 2007-07-27 5:22 ` David Kastrup
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).