git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Kastrup <dak@gnu.org>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] Re: Empty directories...
Date: Sat, 21 Jul 2007 08:53:39 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.0.999.0707210832180.27249@woody.linux-foundation.org> (raw)
In-Reply-To: <85odi69vgt.fsf@lola.goethe.zz>



On Sat, 21 Jul 2007, David Kastrup wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > Of course, it seldom matters, but basically, you should test a directory 
> > structure that has the files
> >
> > 	dir.c
> > 	dir/test
> >
> > in it, and the "dir" directory should always sort _after_ "dir.c".
> >
> > And yes, having the index entry with a '/' at the end would handle
> > that automatically.
> 
> You completely lost me here.  I guess I'll be able to pick this up
> only after investing considerable more time into the data structures.

So the basic issue is that not only does git obviously think that only 
content matters, but it describes it with a single SHA1. 

That's not an issue at all for a single file, but if you want to describe 
*multiple* files with a single SHA1 (which git obviously very much wants 
to do), the way you generate the SHA1 matters a lot.

In particular, the order.

So git is very very strict about the ordering of tree structures. A tree 
structure is not just a random list of

	<ASCII mode> + <space> + <filename> + <NUL> + <SHA1>

it's very much an _ordered_ list of those things, because we want the SHA1 
of the tree to be well-specified by the contents, and that means that the 
contents of a tree object has have absolutely _zero_ ambiguity.

This means, for example, that git is very fundamentally case sensitive. 
There's no sane way *not* to be, because if you're case insensitive in any 
way at all, you'll end up having two trees that are "the same", but end up 
having different SHA1's.

It also means that git objects have absolutely zero "localization". There 
is no locale at all, and there very fundamnetally *must*not* be. Again, 
for the same reason: if you can describe the same filename with two 
different encodings, you'd have two different SHA1's for the same content.

So git filenames are very much a "stream of bytes", not anything else. And 
they need to sort 100% reliably, always the same way, and never with any 
localized meaning.

And, partly because it seemed most natural, and partly for historical 
reasons, the way git sorts filenames is by sorting by *pathname*. So if 
you have three files named

	a.c
	a/c
	abc

then they sort in that exact order, and no other! They sort as a "memcmp" 
in the full pathname, and that's really nice when you see whole 
collections of files, and you know the list is globally sorted.

So that "global pathname sorting" has nice properties, and it seems 
"obvious", but it means that because git actually *encodes* those three 
files hierarchically as two different trees (because there's a 
subdirectory there), the tree objects themselves sort a bit oddly. The 
tree obejcts themselves will look like

 top-level tree:
	100644 a.c -> blob1
	040000 a   -> tree2
	100644 abc -> blob3

 sub-tree:
	100644 c    -> blob2

and notice how the *tree* is not sorted alphabetically at all. It has a 
subtly different sort, where the entry "a" sorts *after* the entry "a.c", 
because we know that it's a tree entry, and thus will (in the *global* 
order) sort as if it had a "/" at the end!

Traditionally, when we have the index, the index sorting has been very 
simple: you just sort the names as memcmp() would sort them. But note how 
that changes, if "a" is an empty directory. Now the index needs to sort as

	file a.c
	dir  a
	file abc

because when we create the tree entry, it needs to be sorted the same way 
all tree entries are always sorted - as if "a" had a slash at the end!

[ Yeah, yeah, we could make a special case and just say "the empty tree 
  sorts differently", but that actually results in huge problems when 
  doing a "diff" between two trees: our diff machinery very much depends 
  on the fact that the index and the trees always sort the same way, and 
  if we sorted the "a" entry (when it is an empty directory) differently 
  from the "a" entry (when it has entries in it), that would just be 
  insane and cause no end of trouble for comparing two trees - one with an 
  empty directory and one with content added to that directory.

  So the sorting is doubly important: it's what makes "one content" always 
  have the same SHA1, but it is also much easier and efficient to compare 
  directories when we know they are sorted the same way. ]

In other words, introducing tree entries in the index ended up also 
introducing all the issues that we already had with the tree objects since 
they got split up hierarchically, but that the code didn't use to have to 
care about.

The easiest way to solve this really does seem to be to add the rule that 
the index entry for an empty directory has to have the "/" at the end of 
the name - then the "sort mindlessly by name" will just continue to work.

But that was what I said was broken: my patches I sent out didn't actually 
do that.

It's *probably* just a few lines of code, and it actually would result in 
some nice changes ("git ls-files" would show a '/' at the end of an empty 
directory entry, for example), so this is not a big deal, but it's an 
example of how subtly different a directory is from a file when it comes 
to git.

			Linus

  reply	other threads:[~2007-07-21 15:54 UTC|newest]

Thread overview: 137+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-18  0:13 Empty directories David Kastrup
2007-07-18  0:35 ` Johannes Schindelin
2007-07-18  6:07   ` David Kastrup
2007-07-18 10:26     ` Johannes Schindelin
     [not found]       ` <86tzs2m1h7.fsf@lola.quinscape.zz>
2007-07-18 11:24         ` Johannes Schindelin
2007-07-18 11:40           ` Matthieu Moy
2007-07-18 12:12             ` David Kastrup
2007-07-18 16:23     ` Linus Torvalds
2007-07-18 16:33       ` Linus Torvalds
2007-07-18 17:38         ` David Kastrup
2007-07-18 18:05           ` Linus Torvalds
2007-07-18 16:39       ` Matthieu Moy
2007-07-18 17:06         ` Linus Torvalds
2007-07-18 21:37           ` David Kastrup
2007-07-18 21:45             ` Linus Torvalds
2007-07-18 23:13               ` David Kastrup
2007-07-18 23:16               ` [RFC PATCH] " Linus Torvalds
2007-07-18 23:40                 ` Linus Torvalds
2007-07-18 23:42                 ` David Kastrup
2007-07-19  0:22                   ` Linus Torvalds
2007-07-19  5:28                     ` Junio C Hamano
2007-07-19  5:38                       ` Shawn O. Pearce
2007-07-19  6:08                         ` David Kastrup
2007-07-19  7:10                           ` Geoff Russell
2007-07-19  6:09                         ` Shawn O. Pearce
2007-07-19  8:13                           ` Matthieu Moy
2007-07-19 10:51                             ` Tomash Brechko
2007-07-19 11:31                               ` David Kastrup
2007-07-19 12:32                                 ` Tomash Brechko
2007-07-19 12:46                                   ` David Kastrup
2007-07-23 20:18                                     ` Nix
2007-07-23 20:49                                       ` David Kastrup
2007-07-23 21:49                                         ` Nix
2007-07-23 22:05                                           ` Nix
2007-07-23 22:52                                             ` Jakub Narebski
2007-07-25 22:43                                               ` Nix
2007-07-23 22:16                                           ` David Kastrup
2007-07-23 22:31                                             ` Linus Torvalds
2007-07-23 23:32                                               ` Nix
2007-07-23 23:57                                                 ` Linus Torvalds
     [not found]                                               ` <86ps2ithyl.fsf@lola.quinscape.zz>
2007-07-24  6:56                                                 ` Nix
2007-07-19 12:38                                 ` David Kastrup
2007-07-19 13:21                                   ` David Kastrup
2007-07-19 12:16                               ` Johannes Schindelin
2007-07-19 12:24                                 ` David Kastrup
2007-07-19 14:44                                   ` Brian Gernhardt
2007-07-19 15:43                                     ` Johannes Schindelin
2007-07-19 16:06                                       ` Brian Gernhardt
2007-07-19 16:17                                         ` Johannes Schindelin
2007-07-19 16:28                                           ` David Kastrup
2007-07-19 16:34                                           ` Brian Gernhardt
2007-07-19 17:30                                             ` Johannes Schindelin
     [not found]                                             ` <Pine.LNX.4.64.070719 1829530.14781@racer.site>
2007-07-19 17:47                                               ` David Kastrup
2007-07-19 16:17                                       ` Matthieu Moy
2007-07-19 16:21                                       ` David Kastrup
     [not found]                         ` <9436820E-53D1-425D-922E-D4C76578E40A@silverinsanity.com>
     [not found]                           ` <863azk78yp.fsf@lola.quinscape.zz>
2007-07-19 15:08                             ` Brian Gernhardt
2007-07-19 15:27                               ` David Kastrup
2007-07-19 15:50                                 ` Brian Gernhardt
2007-07-20  0:01                               ` Junio C Hamano
2007-07-20  0:15                                 ` Linus Torvalds
2007-07-20  0:33                                   ` Linus Torvalds
2007-07-20  2:24                                     ` Junio C Hamano
2007-07-20  2:31                                       ` Linus Torvalds
2007-07-20  5:55                                         ` David Kastrup
2007-07-20  5:58                                       ` David Kastrup
2007-07-20 15:31                                         ` Linus Torvalds
2007-07-20  5:35                                     ` David Kastrup
2007-07-20  9:27                                       ` Simon 'corecode' Schubert
2007-07-20 10:11                                         ` David Kastrup
2007-07-20 10:34                                         ` Junio C Hamano
2007-07-20 13:23                                           ` David Kastrup
2007-07-20 19:24                                           ` Linus Torvalds
2007-07-20 21:02                                             ` Johan Herland
2007-07-20 21:48                                               ` Linus Torvalds
2007-07-20 22:36                                                 ` Julian Phillips
2007-07-21  0:18                                                   ` Linus Torvalds
2007-07-21  1:23                                                     ` David Kastrup
2007-07-21  3:54                                                       ` David Kastrup
     [not found]                                     ` <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>
2007-07-20  5:53                                       ` David Kastrup
2007-07-20 10:19                                   ` Olivier Galibert
2007-07-19  5:59                       ` David Kastrup
2007-07-19  9:54                         ` David Kastrup
     [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.070718=041710271.?= =?ISO-8859-1?Q?27353@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
2007-07-22 21:08                     ` David Kastrup
2007-07-21  4:29                 ` David Kastrup
2007-07-21  4:51                   ` Linus Torvalds
2007-07-21  5:08                     ` Linus Torvalds
2007-07-21  5:28                       ` David Kastrup
2007-07-21 15:53                         ` Linus Torvalds [this message]
2007-07-21 17:38                           ` David Kastrup
2007-07-21 17:52                             ` Simon 'corecode' Schubert
2007-07-21 18:08                               ` David Kastrup
2007-07-21 23:50                             ` Linus Torvalds
2007-07-22  0:18                               ` David Kastrup
2007-07-22  0:37                                 ` Linus Torvalds
2007-07-22  1:05                                   ` David Kastrup
2007-07-22  1:41                                     ` Linus Torvalds
2007-07-22  2:39                                       ` David Kastrup
2007-07-22  3:43                                         ` Linus Torvalds
2007-07-22  4:28                                           ` David Kastrup
2007-07-22  6:38                                             ` david
2007-07-22  9:08                                               ` David Kastrup
2007-07-22 17:30                                                 ` Linus Torvalds
2007-07-22 17:59                                                   ` David Kastrup
2007-07-22 17:28                                             ` Linus Torvalds
2007-07-22 17:33                                             ` Linus Torvalds
     [not found]                                             ` <alpine.L FD.0.999.0707221031050.3607@woody.linux-foundation.org>
2007-07-22 18:58                                               ` David Kastrup
2007-07-22  1:16                                 ` Jakub Narebski
2007-07-22  1:39                                   ` David Kastrup
2007-07-22 12:06                                     ` Jakub Narebski
2007-07-22 13:53                                       ` David Kastrup
2007-07-22 20:26                                         ` Jakub Narebski
2007-07-22 22:57                                           ` David Kastrup
2007-07-23  6:05                                             ` David Kastrup
2007-07-23  7:45                                               ` David Kastrup
2007-07-22  0:34                               ` David Kastrup
2007-07-22  4:00                             ` Brian Gernhardt
2007-07-28  8:44                       ` David Kastrup
     [not found]                   ` <?= =?ISO-8859-1?Q?alpine.LFD.0.999?= =?ISO-8859-1?Q?.07072=0402135450.?= =?ISO-8859-1?Q?27249@woody.linu?= =?ISO-8859-1?Q?x-foundation.org?= =?ISO-8859-1?Q?>
2007-07-21  5:15                     ` David Kastrup
2007-07-18 17:34       ` David Kastrup
2007-07-18  0:39 ` Matthieu Moy
2007-07-18  6:16   ` David Kastrup
2007-07-18  6:30     ` Shawn O. Pearce
2007-07-18  2:23 ` Junio C Hamano
2007-07-18  5:56   ` David Kastrup
2007-07-18  6:34     ` Wincent Colaiuta
2007-07-18  6:53     ` Junio C Hamano
     [not found]       ` <867ioyqhgc.fsf@lola.quinscape.zz>
2007-07-18 23:34         ` Junio C Hamano
2007-07-20  8:29       ` Johan Herland
2007-07-20  8:41         ` David Kastrup
2007-07-20 10:20           ` Johan Herland
2007-07-20 10:54             ` David Kastrup
2007-07-20 12:18               ` Johan Herland
     [not found]                 ` <86odi7utdj.fsf@lola.quinscape.zz>
2007-07-20 13:20                   ` Johan Herland
2007-07-20 13:33                     ` David Kastrup
2007-07-22 21:35       ` David Kastrup
2007-07-26 23:33 ` Robin Rosenberg
2007-07-27  5:22   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.999.0707210832180.27249@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).