newbie questions about git design and features (some wrt hg)

All of lore.kernel.org
 help / color / mirror / Atom feed

* newbie questions about git design and features (some wrt hg)
@ 2007-01-30 16:20 Mike Coleman
  2007-01-30 16:41 ` Johannes Schindelin
                   ` (4 more replies)
  0 siblings, 5 replies; 61+ messages in thread
From: Mike Coleman @ 2007-01-30 16:20 UTC (permalink / raw)
  To: git

Hi,

I recently decided to jump into the DVCS pool, and I've been studying
what seem to me to be the two leading candidates--git and
mercurial--to try to understand the differences between them in design
and features.  I have some questions that I hope you can enlighten me
on.

1.  As of today, is there any real safety concern with either tool's
repo format?  Is either tool significantly better in this regard?
(Keith Packard's post hints at a problem here, but doesn't really make
the case.)

2.  Does the git packed object format solve the performance problem
alluded to in posts from a year or two ago?

3.  Someone mentioned that git bisect can work between any two
commits, not necessarily just one that happens to be an ancestor of
the other.  This sounds really cool.  Can hg's bisect do this, too?

4.  What is git's index good for?  I find that I like the idea of it,
but I'm not sure I could justify it's presence to someone else, as
opposed to having it hidden in the way that hg's dircache (?) is.  Can
anyone think of a good scenario where it's a pretty obvious benefit?

5.  I think I read that there'd been just one incompatible change over
time in the git repo format.  What was it?

6.  Does either tool use hard links?  This matters to me because I do
development on a connected machine and a disconnected machine, using a
usb drive to rsync between.  (Perhaps there'll be some way to transfer
changes using git or hg instead of rsync, but I haven't figured that
out yet.)

7.  I'm a fan of Python, and I'm really a fan of using high-level
languages with performance-critical parts in a lower-level language,
so in that regard, I really like hg's implementation.  If someone
wanted to do it, is a Python clone of git conceivable?  Is there
something about it that just requires C?

8.  It feels like hg is not really comfortable with parallel
development over time on different heads within a single repo.
Rather, it seems that multiple repos are supposed to be used for this.
 Does this lead to any problems?  For example, is it harder or
different to merge two heads if they're in different repo than if
they're in the same repo?

Thanks in advance,
Mike

(I'll probably post this on the hg list as well.)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
@ 2007-01-30 16:41 ` Johannes Schindelin
  2007-01-30 16:55 ` Shawn O. Pearce
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Johannes Schindelin @ 2007-01-30 16:41 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

Hi,

On Tue, 30 Jan 2007, Mike Coleman wrote:

> I recently decided to jump into the DVCS pool, and I've been studying 
> what seem to me to be the two leading candidates--git and mercurial--to 
> try to understand the differences between them in design and features.  
> I have some questions that I hope you can enlighten me on.

I don't know Mercurial, but I know git pretty well.

> 1.  As of today, is there any real safety concern with either tool's 
> repo format?  Is either tool significantly better in this regard? (Keith 
> Packard's post hints at a problem here, but doesn't really make the 
> case.)

I can't remember any post hinting at a problem with regard to git's 
safety.

Git is very conservative about data. In fact, even if you fsck up royally, 
chances are very good that you recover without losing any data (which was 
stored in the repo).

Of course, a "rm -rf ." will hurt in any case.

> 2.  Does the git packed object format solve the performance problem 
> alluded to in posts from a year or two ago?

I don't know what posts you mean.

In any case, if performance is an issue for you, you should keep your 
repository packed (with new Git, you should call "git gc" every once in a 
while).

> 3.  Someone mentioned that git bisect can work between any two commits, 
> not necessarily just one that happens to be an ancestor of the other.  
> This sounds really cool.  Can hg's bisect do this, too?

No idea.

> 4.  What is git's index good for?  I find that I like the idea of it, 
> but I'm not sure I could justify it's presence to someone else, as 
> opposed to having it hidden in the way that hg's dircache (?) is.  Can 
> anyone think of a good scenario where it's a pretty obvious benefit?

Git's index is a staging area. Every VCS has it, but most hide it. One of 
the moments where Git's index really shines is merge conflicts: you can 
inspect _just_ the conflicts by calling "git diff".

> 5.  I think I read that there'd been just one incompatible change over 
> time in the git repo format.  What was it?

Objects names were the hash of the _compressed_ contents. Now it's the 
uncompressed contents (which allows you to change the compression 
parameters without any hassle). This was _looong_ time ago.

> 6.  Does either tool use hard links?  This matters to me because I do 
> development on a connected machine and a disconnected machine, using a 
> usb drive to rsync between.  (Perhaps there'll be some way to transfer 
> changes using git or hg instead of rsync, but I haven't figured that out 
> yet.)

No idea about Mercurial, but you can clone using hard links with the 
--local option.

There is still a script called git-relink, which can hard link 
unpacked objects of two object databases, but since we usually keep 
everything packed nowadays, I deem this obsolete.

Of course, the better way is to use --shared with clone, so that a 
"virtual hard link" is set up: The alternates mechanism of Git is set up 
such that you reuse (even new) objects from the alternate repository.

> 7.  I'm a fan of Python, and I'm really a fan of using high-level 
> languages with performance-critical parts in a lower-level language, so 
> in that regard, I really like hg's implementation.  If someone wanted to 
> do it, is a Python clone of git conceivable?  Is there something about 
> it that just requires C?

I am not a fan of Python. I think that all the Perl hackers of old days 
migrated to Python, because the code _looks_ nicer. But it's the same old 
crap.

That said, nothing (least of which, me) hinders you reimplementing Git in 
Python. The performance critical parts are in the revision walking, and 
the diff machinery.

The revision walking is not really reentrant (yet), so you would have to 
fix that up a little before being able to link it natively.

> 8.  It feels like hg is not really comfortable with parallel development 
> over time on different heads within a single repo. Rather, it seems that 
> multiple repos are supposed to be used for this. Does this lead to any 
> problems?

Usually not. I do it all the time (keep the branches in the same repo).

> For example, is it harder or different to merge two heads if they're in 
> different repo than if they're in the same repo?

You _have_ to fetch it to merge it (this operation, fetch & merge, is 
called pull with Git), but there is no difference if the branch 
to-be-merged is local or remote. In fact, branches are not supposed to be 
"local" or "remote". You can _have_ them where you want. And you can tell 
if two branches are identical by the object name of their tip.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
  2007-01-30 16:41 ` Johannes Schindelin
@ 2007-01-30 16:55 ` Shawn O. Pearce
  2007-01-31  1:55   ` Theodore Tso
  2007-01-30 17:44 ` Jakub Narebski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Shawn O. Pearce @ 2007-01-30 16:55 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

Mike Coleman <tutufan@gmail.com> wrote:
> 1.  As of today, is there any real safety concern with either tool's
> repo format?  Is either tool significantly better in this regard?
> (Keith Packard's post hints at a problem here, but doesn't really make
> the case.)

I think the Git format is tighter in terms of compression,
and simpler in terms of understanding and writing code.  I have
personally written the code to read and write the Git repository
format in both C and Java, and in both cases it falls out in just
a few hundred lines of code (assuming you have libz handy to do
the compression/decompression for you).

The Git format is completely safe with regards to parallel
modification of a repository, which is good for shared repositories
that might have multiple people pushing into it at once.

Git's format is also safe with regards to *any* update.
You literally cannot destroy the repository during an update.
Its impossible.  You'd have to physically destroy the storage device.
(OK, that's overstating it a bit, but it is really hard.)

The point Keith was making was the Git format is "add-only".
Once something has been stored, we NEVER modify it again.  This
bypasses any sort of possible problems that can occur with partial
modifications caused by a process aborting in the middle of a change.

I think hg modifies files as it goes, which could cause some issues
when a writer is aborted.  I'm sure they have thought about the
problem and tried to make it safe, but there isn't anything safer
than just leaving the damn thing alone.  :)

> 2.  Does the git packed object format solve the performance problem
> alluded to in posts from a year or two ago?

Yes.  By a huge margin.  Git's *fast*.  Ignore anything from a year
or two ago.

> 3.  Someone mentioned that git bisect can work between any two
> commits, not necessarily just one that happens to be an ancestor of
> the other.  This sounds really cool.  Can hg's bisect do this, too?

No clue.

> 4.  What is git's index good for?  I find that I like the idea of it,
> but I'm not sure I could justify it's presence to someone else, as
> opposed to having it hidden in the way that hg's dircache (?) is.  Can
> anyone think of a good scenario where it's a pretty obvious benefit?

Its a good way to stage the stuff in your next commit.  By that I
mean you edit some code.  Then you look at what differs between the
index and your working directory.  You decide "this hunk is good, it
passed the tests, I want to commit that, so toss it into the index".
Now that hunk isn't different anymore.

When it comes time to commit, all of your already reviewed stuff is
staged in the index.  You just need to issue a commit and supply
the message.  But you can leave modified stuff in the working
directory, even for files that were alerady updated in the index.

This really helps during a merge.  Only the stuff which Git could
not merge for you is seen as different between the index and the
working directory; all of the stuff that Git merged for you is
already staged in the index.  So you can focus on the conflicts,
and stage their resolutions into the index as you go.  This makes
it easier to work through larger merges where more than 1 or 2
files contains conflicts.

> 5.  I think I read that there'd been just one incompatible change over
> time in the git repo format.  What was it?

A LONG time ago, like in the very first version Linus offered out
to the public, we computed the identity of an object using the
SHA-1 hash of the *compressed* data.  This is sensitive to the
compression settings used, and was not the best idea as a result.

It was very quickly changed to compute the identity of the object
using the SHA-1 has of the raw (user) data, removing any dependence
on the compression routine to always yield the same result for the
same input.

We haven't had a change since then.  We have added some new
compression options which are just that, options.  If you use them
older Git binaries won't necessarily recognize the repository data,
but these are off by default and can be enabled on a per-repository
basis.  E.g. if you are only using newer Git on a given system you
can enable the newer compression features on all of the repositories
on that system.

> 6.  Does either tool use hard links?  This matters to me because I do
> development on a connected machine and a disconnected machine, using a
> usb drive to rsync between.  (Perhaps there'll be some way to transfer
> changes using git or hg instead of rsync, but I haven't figured that
> out yet.)

Git can use hardlinks if you ask it to.  We only use them for the
repository files, not for the user's actual source files.

Git has its own native transport (git-push, git-fetch) which can
move data between two Git repositories via local filesystem access,
SSH, HTTP, FTP, and rsync (latter two are read-only transports).

> 7.  I'm a fan of Python, and I'm really a fan of using high-level
> languages with performance-critical parts in a lower-level language,
> so in that regard, I really like hg's implementation.  If someone
> wanted to do it, is a Python clone of git conceivable?  Is there
> something about it that just requires C?

Yes, a Python clone of Git is conceivable.  Indeed, there is a
pure Java clone in process (jgit) for an Eclipse plugin (egit).
If you wanted to rewrite Git in Python, knock yourself out.
But we've ported all of our Python to C, as its just faster.

> 8.  It feels like hg is not really comfortable with parallel
> development over time on different heads within a single repo.
> Rather, it seems that multiple repos are supposed to be used for this.
> Does this lead to any problems?  For example, is it harder or
> different to merge two heads if they're in different repo than if
> they're in the same repo?

No clue.  I know multiple heads in one Git repository works
*awesome*.  Especially on large repositories (>10k files) as the time
required to start a new branch is only the time needed to update the
files in the working directory which don't have the correct version.
Usually that's a small percentage (<200) of the files and thus its
very fast to switch to a new branch of development, and switch back.

On a decent UNIX system (and my Mac OS X PowerBook doesn't really
count) flipping branches in git-gui is almost immediate.  You pick
the branch in the menu and *wham* its switched.  And that's my
PowerBook, which as I said, doesn't quite count as good UNIX
system...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
  2007-01-30 16:41 ` Johannes Schindelin
  2007-01-30 16:55 ` Shawn O. Pearce
@ 2007-01-30 17:44 ` Jakub Narebski
  2007-01-30 18:06 ` Linus Torvalds
  2007-01-30 18:11 ` Junio C Hamano
  4 siblings, 0 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-01-30 17:44 UTC (permalink / raw)
  To: git; +Cc: mercurial, mercurial

[Cc: git@vger.kernel.org]
[Followup-To: git@vger.kernel.org aka gmane.comp.version-control.git]

Mike Coleman wrote:

> I recently decided to jump into the DVCS pool, and I've been studying
> what seem to me to be the two leading candidates--git and
> mercurial--to try to understand the differences between them in design
> and features.  I have some questions that I hope you can enlighten me
> on.
> 
> 1.  As of today, is there any real safety concern with either tool's
> repo format?  Is either tool significantly better in this regard?
> (Keith Packard's post hints at a problem here, but doesn't really make
> the case.)

I don't know if Mercurial is safe with respect to interrupting in
the midle of update. Git is safe in that regard; the only unsafe
command is git-prune (and it is explicitely as such marked in
documentation); there were some attempts lately about making it safer.

> 2.  Does the git packed object format solve the performance problem
> alluded to in posts from a year or two ago?

If it was I/O performance problem, then packed objects format and
to lesser extent (important only if you have large number of tags
or branches) packed refs format, should solve it.

See http://git.or.cz/gitwiki/GitBenchmarks (most probably biased).

> 3.  Someone mentioned that git bisect can work between any two
> commits, not necessarily just one that happens to be an ancestor of
> the other.  This sounds really cool.  Can hg's bisect do this, too?

The very idea of git bisect was for it to work with nonlinear history.
Otherwise it wouldn't be really necessary.

git-bisect(1) mentions git-rev-list --bisect option, and in description
of --bisect in git-rev-list(1) we have:

        Limit output to the one  commit  object  which  is  roughly  halfway
        between the included and excluded commits. Thus, if

                $ git-rev-list --bisect foo ^bar ^baz

        outputs 'midpoint', the output of the two commands

                $ git-rev-list foo ^midpoint
                $ git-rev-list midpoint ^bar ^baz

        would be of roughly the same length. Finding the change which intro-
        duces a regression is thus reduced to a  binary  search:  repeatedly
        generate  and  test  new  'midpoint's  until  the commit chain is of
        length one.

(where "git rev-list foo ^bar" means listing all commits reachable from
commit, tag or branch 'foo' which are not reachable from 'bar').

> 4.  What is git's index good for?  I find that I like the idea of it,
> but I'm not sure I could justify it's presence to someone else, as
> opposed to having it hidden in the way that hg's dircache (?) is.  Can
> anyone think of a good scenario where it's a pretty obvious benefit?

Git index is used to stage commits, i.e. create it part by part (create
some changes, view diff of those changes, save those changes to index,
create some new changes, view diff of new changes, etc.). And it is very
useful in resolving merges and merge conflicts (you can view diff only
of the conflicted part). Also it makes add / remove operations easier
to understand. 

It also allows for some tricks like "SCM remove all files
known to SCM, which are missing in working repository", or "make SCM
think that all files are newer when importing from tar file".

> 5.  I think I read that there'd been just one incompatible change over
> time in the git repo format.  What was it?

If you are referring to the change that sha-1 used to be of compressed
contents, it was IIRC before first public release.

> 6.  Does either tool use hard links?  This matters to me because I do
> development on a connected machine and a disconnected machine, using a
> usb drive to rsync between.  (Perhaps there'll be some way to transfer
> changes using git or hg instead of rsync, but I haven't figured that
> out yet.)

Git can use hard links (git clone --local, git relink) but does not need
to. If you have hardlinks under version control, git does not checkout
them as hardlinks.

> 7.  I'm a fan of Python, and I'm really a fan of using high-level
> languages with performance-critical parts in a lower-level language,
> so in that regard, I really like hg's implementation.  If someone
> wanted to do it, is a Python clone of git conceivable?  Is there
> something about it that just requires C?

C is for performance. Git is not libified, and it would be hard to get
it fully (or at least most important parts) libified.

> 8.  It feels like hg is not really comfortable with parallel
> development over time on different heads within a single repo.
> Rather, it seems that multiple repos are supposed to be used for this.
> Does this lead to any problems?  For example, is it harder or
> different to merge two heads if they're in different repo than if
> they're in the same repo?

In git if you want to merge two heads in different repos, you in fact
first download (fetch) the objects from other repo, then merge two local
head one of which can be temporary (FETCH_HEAD) although usually we use
local branch (so called tracking branch) to always refer to downloaded
objects from other repo.

> (I'll probably post this on the hg list as well.)

I'm not sure if Mercurial mailing list is not subscribe-to-post,
unfortunately...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
                   ` (2 preceding siblings ...)
  2007-01-30 17:44 ` Jakub Narebski
@ 2007-01-30 18:06 ` Linus Torvalds
  2007-01-30 19:37   ` Linus Torvalds
  2007-01-30 18:11 ` Junio C Hamano
  4 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-01-30 18:06 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

On Tue, 30 Jan 2007, Mike Coleman wrote:
> 
> 1.  As of today, is there any real safety concern with either tool's
> repo format?  Is either tool significantly better in this regard?
> (Keith Packard's post hints at a problem here, but doesn't really make
> the case.)

I think Keith was nervous about hg, because hg
 (a) has changed repo formats a few times and was talking about changing 
     it again (but since I don't follow hg very closely, I don't know if 
     that has happened, will happen, or was shelved)
 (b) modifies data in-place.

Git doesn't really do either. Git has extended the repository format a few 
times (notably pack-files), but apart from a *really* early change at the 
very beginning of development, the git repo format is identical today to 
what it was originally, and you can read old repositories without any 
converision what-so-ever.

Also, the git repository format is (and has always been) "stable" in 
another sense: we never *ever* re-write any old data. Even when we 
re-pack, we write a totally new copy, and while you'd often then get rid 
of duplicates afterwards, the operation is fundamentally safer that way.

> 2.  Does the git packed object format solve the performance problem
> alluded to in posts from a year or two ago?

If you mean the original discussions in the first few months of git 
development, then yes. People used to worry that git's unpacked format was 
not only slow, but also would chew up disk like mad. Both were true, and 
yes, both were solved by the packed format (to the point where I think git 
uses the *least* amount of disk space of any SCM ever made ;)

HOWEVER. Git definitely has a different "performance profile" than many 
other SCM's do, and it's something worth keeping in mind. That has less to 
do with the pack-files than just very fundamental git design.

In particular, *every* other SCM I am aware of does history on a per-file 
basis. Git very fundamentally does not. This means that while git 
outperforms just about anything else, if you expect "individual file 
history" to be any faster than "whole repository history", you're simply 
going to be in for a surprise. It very fundamentally isn't. 

We had this particular performance "anomaly" be discussed just the other 
week. People seem to be so used to the "file ID" mentality that has its 
roots in RCS etc, that they expect "git log <filename>" to somehow be 
faster than "git log". In git, that's simply not true. History is *always* 
seen as a "full repository history". There simply isn't anything else.

I personally don't see this as a "problem", but it definitely is 
*different*. And it causes a different performance profile for various 
operations than you'd see with other SCM's.

[ The reason I don't think this is a problem is because it's partly what 
  makes whole-repository operations like "merge" so fast. But it's also 
  the thing that causes git to very naturally not care about single files, 
  and anything you can do with a single file you can basically do with an 
  arbitrary set of files or directories. Which is *very* powerful, and as 
  far as I know, no other SCM can effectively do that at all.

  As a top-level maintainer of a project with tens of thousands of files, 
  I end up almost never looking at individual files: I look at collections 
  of files. And that's where git shines, and almost everybody else falls 
  flat on their face. But if you have the "single-file" mentality, you 
  will find operations that you think git does badly. ]

> 3.  Someone mentioned that git bisect can work between any two
> commits, not necessarily just one that happens to be an ancestor of
> the other.  This sounds really cool.  Can hg's bisect do this, too?

I suspect it can - as far as I know, the whole "bisect" thing originated 
with git, and hg picked up the idea from that. You'd have to be really 
stupid (and/or have a horrible repo format) to not be able to do multiple 
unrelated commits.

HOWEVER! One thing that may make it less useful in hg is that last I 
heard, hg didn't do multiple independent branches in the same repository. 
So some of the more useful usage schenarios may simply not be viable in hg 
at all (ie you'd have to merge in order to bring the two unrelated commits 
into the same hg repository, and merging may not always be possible).

So with git, you can say "that branch is good, this branch is bad, what 
caused the regression?" by using "git bisect". In hg, I'm not sure that 
works, simply because of the weakness of branches. But you'd have to ask 
the hg lists. They do have *some* concept of branches within a repo, so it 
may well be that it all works out.

> 4.  What is git's index good for?  I find that I like the idea of it,
> but I'm not sure I could justify it's presence to someone else, as
> opposed to having it hidden in the way that hg's dircache (?) is.  Can
> anyone think of a good scenario where it's a pretty obvious benefit?

It's a huge deal during merging with conflicts.

During merging, the index is the part that shows you what the conflicts 
are, and also where you mark any conflict resolution while the working 
tree is still not fully resolved. However, it's kind of hard to show the 
"obvious benefit" without actually showing an example of a real (and 
complex) merge conflict, and I'm way too lazy for that.

It has advantages in many other situations too, but they are more subtle. 
One of the things _I_ consider to be an advantage (but which confuses some 
people because it's also another thing that makes git different from many 
other SCM systems) is that the index is also where you "prepare" your work 
for committing, and this is especially obvious when adding new files.

Every single SCM has *some* kind of an index, even if it's as simple as 
just the CVS "list of files I know about". So in CVS, the "index" is 
really just the "CVS/Entries" list. You really can think of the git index 
as just a "CVS/Entries" kind of thing, done right.

So what does "done right" mean? It means that the git index not only lists 
the filenames, it lists their *contents* and status too. That means that 
when you do a "git add", you don't just add a filename to the list of 
files you know about, you literally add the *content*.

The reason this is important is that this is fundmanetally how git works: 
git doesn't actually really *ever* work with filenames at any stage all, 
git either works with "content" (which obviously includes the notion of a 
filename, but it is also the mode of the file and the content of the 
file), and git also has a notion of "pathname limiter", which basically 
works on a repository "tree" level, and limits the content to just a 
subset of the whole tree.

So the "index" is very much part of this - it's just another portion of 
the fact that git always tracks *contents* and never tracks "file ID's".

So in CVS (or SVN), when you do a "cvs add", you really don't add any 
content to the repository, you are really adding a new "file ID" to the 
list of files that CVS/SVN tracks. In git, when you do "git add", you are 
really adding content, but that also means that the index - the 
"CVS/Entries" replacement - has to be able to track things differently.

Anyway, if you come from CVS, and have worked with it intimately enough 
that you know how things like CVS/Entries work, it should actually be 
fairly easy to pick up on the git index. You just need to mentally realize 
"oh, it contains the contents, file mode and merge conflict state too!"

> 5.  I think I read that there'd been just one incompatible change over
> time in the git repo format.  What was it?

The original git object naming was to first compress the object, and then 
calculate the SHA1 of the compressed end result. That was stupid, stupid, 
and I admit it.  I switched it around.

However, to get some notion of how early this was, the first git release 
was done on April 7, 2005. The change-over to switch the compression and 
SHA-1 hashing around was done April 20, 2005. There was an additional 
fix to do the date handling more sanely, April 23, 2005. The format has 
been stable since.

So yes, there has been one real format change, and it happened two weeks 
into development, long before git was really usable by mere mortals at 
all.

After that, we have added capabilities to the the database (notably, the 
packed files, and a new simplified loose object format), but as far as I 
know, current git will happily read any git archive written after April 
23rd, 2005. With no data conversion necessary.

Going the other way is obviously not always possible. If you get a git 
from May of 2005, and try to use it on an archive that uses pack-files, it 
obviously will *not* work. But even there, we've been very careful, and 
unless you set some specific options in your config file or do things like 
explicitly pack your branch head/tag references, fairly old versions of 
git will happily read even new archives.

> 6.  Does either tool use hard links?  This matters to me because I do
> development on a connected machine and a disconnected machine, using a
> usb drive to rsync between.  (Perhaps there'll be some way to transfer
> changes using git or hg instead of rsync, but I haven't figured that
> out yet.)

I don't know about hg (but will assume not). Git generally does not, but 
doesn't mind them either if you have them in your working tree.

And yes, there are ways to transfer using git natively, and they tend to 
be a lot more useful and safe than rsync.

> 7.  I'm a fan of Python, and I'm really a fan of using high-level
> languages with performance-critical parts in a lower-level language,
> so in that regard, I really like hg's implementation.  If someone
> wanted to do it, is a Python clone of git conceivable?  Is there
> something about it that just requires C?

It doesn't "require" C in the sense that the object format is actually 
fairly simple, and you could do things natively in python if you *really* 
wanted. That said, the whole approach of git has always been to write the 
"core" core in C, and just make the thing very scriptable. 

Some things simply are not sensible to do in a slow interpreted language. 
Things like generating diffs (another name for "comparing two trees") is 
fundamnetally much too performance-sensitive for anything but a serious 
system language. You need a compiled language with a good compiler, no 
"byte code pre-compilers" need apply. Same goes for the "view repository 
through a filename filter" thing.

We used to have our standard "merge" function written in python, but 
mainly because it was our *only* python dependency, it actually got 
rewritten in C (also, people - including me - really expect to merge two 
branches with 20+ _thousand_ files in them in less than a second, so that 
may explain another reason why the merge got rewritten).

> 8.  It feels like hg is not really comfortable with parallel
> development over time on different heads within a single repo.
> Rather, it seems that multiple repos are supposed to be used for this.
> Does this lead to any problems?  For example, is it harder or
> different to merge two heads if they're in different repo than if
> they're in the same repo?

That is my understanding too, but I've not followed hg actively.

The git branching model really is superior. It might take a while to get 
used to it (it took _me_ a while to get used to it ;), but once you do, 
everybody else so *obviously* does it so horribly badly that it's not even 
funny.

So the whole "multiple branches in the same repo" thing really shines in 
git. SCM's like SVN *claim* that they do multiple branches, but they 
really don't. They are just confused.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
                   ` (3 preceding siblings ...)
  2007-01-30 18:06 ` Linus Torvalds
@ 2007-01-30 18:11 ` Junio C Hamano
  2007-01-31  3:38   ` Mike Coleman
  4 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2007-01-30 18:11 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

"Mike Coleman" <tutufan@gmail.com> writes:

> Hi,

Hi, Mike.

I won't go into "comparison", and I know some people will fill
the details in other things

> 5.  I think I read that there'd been just one incompatible change over
> time in the git repo format.  What was it?

There actually were a few incompatible ones but they were only
during the first few weeks of life (the beginning of time is Apr
7th, 2005):

 - The tree object was originally a flat "manifest" but was
   converted to hierarchy of trees (Apr 10th, 2005)

 - The metainformation directory was originally called .dircache,
   but renamedto ".git" (Apr 11th, 2005)

 - The order of compression and hashing was swapped and older
   objects needed conversion (Apr 20, 2005)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 18:06 ` Linus Torvalds
@ 2007-01-30 19:37   ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-01-30 19:37 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git



On Tue, 30 Jan 2007, Linus Torvalds wrote:
> 
> We had this particular performance "anomaly" be discussed just the other 
> week. People seem to be so used to the "file ID" mentality that has its 
> roots in RCS etc, that they expect "git log <filename>" to somehow be 
> faster than "git log". In git, that's simply not true. History is *always* 
> seen as a "full repository history". There simply isn't anything else.

Side note: some people have talked about changing this by generating some 
kind of per-filename cache to make logging ops have an "accelerated" mode 
for the trivial cases.

So maybe git at some future date will have a special-case for a single 
filename, but that's definitely not the case today.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 16:55 ` Shawn O. Pearce
@ 2007-01-31  1:55   ` Theodore Tso
  2007-01-31 10:56     ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Theodore Tso @ 2007-01-31  1:55 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mike Coleman, git

On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
> I think hg modifies files as it goes, which could cause some issues
> when a writer is aborted.  I'm sure they have thought about the
> problem and tried to make it safe, but there isn't anything safer
> than just leaving the damn thing alone.  :)

To be fair hg modifies files using O_APPEND only.  That isn't quite as
safe as "only creating new files", but it is relatively safe.

One other interesting point which came as a surprise to me is the size
of the repositories.  It used to be that git was much more inefficient
at storing files that Mercurial.  However, what pack files, and
deltas, I'm pleased to say a mercurial repository with all of
e2fsprogs' history is 20 megs, while a git repository with the same
history is 11 megs.   

(BTW, I've been doing some hacking of Stelian Pop's hg2git.py in my
spare time.  Most of the changes are e2fsprogs specific, but if anyone
else is hacking on it, I'd love to compare notes.  I have some vague
thoughts about making hg2git to be bidirectional, but I probably won't
have time to implement it.)

> Yes.  By a huge margin.  Git's *fast*.  Ignore anything from a year
> or two ago.

I'd go even further.  You probably want to use the latest git 1.5.0 rc
release, or final, since it has been a lot uf usability and
documentation improvements over previous git releases.

> > 4.  What is git's index good for?  I find that I like the idea of it,
> > but I'm not sure I could justify it's presence to someone else, as
> > opposed to having it hidden in the way that hg's dircache (?) is.  Can
> > anyone think of a good scenario where it's a pretty obvious benefit?

In git 1.5.0 it's a lot easier for users to not have to worry about
the index if they don't want to.  It's not quite so much in the user's
face, although there is still improvement to be had, especially in the
git documentation.  There is a git user's manual being prepared (that
I think will be in 1.5.0, hopefully) that is much better than "man git".

> This really helps during a merge.  Only the stuff which Git could
> not merge for you is seen as different between the index and the
> working directory; all of the stuff that Git merged for you is
> already staged in the index.  So you can focus on the conflicts,
> and stage their resolutions into the index as you go.  This makes
> it easier to work through larger merges where more than 1 or 2
> files contains conflicts.

The flip side of this is that mercurial as much better integration
with graphical merge tools, which git doesn't have by default (yet).

> > 8.  It feels like hg is not really comfortable with parallel
> > development over time on different heads within a single repo.
> > Rather, it seems that multiple repos are supposed to be used for this.
> > Does this lead to any problems?  For example, is it harder or
> > different to merge two heads if they're in different repo than if
> > they're in the same repo?

hg has only recently added support for development on different heads
within the same repo.  So it's just more immature there.  Presumably
over time it will get better.  Most poeple who use use hg don't use a
lot of different branches, for things like topic branches, for
example.  If you prefer to use that style of interaction, git is going
to be a much better choice for you.

Regards,

							- Ted

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-30 18:11 ` Junio C Hamano
@ 2007-01-31  3:38   ` Mike Coleman
  2007-01-31  4:35     ` Linus Torvalds
  2007-01-31 15:03     ` Nicolas Pitre
  0 siblings, 2 replies; 61+ messages in thread
From: Mike Coleman @ 2007-01-31  3:38 UTC (permalink / raw)
  To: git

Thanks for all of your replies--this information is very helpful.
Though both hg and git look good, I will probably try git first,
partly because it seems the most interesting.  It feels like fertile
ground for experiments, and I suspect someone will think of some
surprising application for it.  (Also, I had the privilege of working
with Junio in a past life, and I consider his involvement a good
portent.)

This mercurial list post by Ted Tso was also useful:

    http://www.selenic.com/pipermail/mercurial/2007-January/012039.html

Regarding a Python (or other interpreted language) implementation, the
most obvious practical benefit would be an easy win32 port.  Not that
I'd ever choose to develop there, but it removes its lack as an
objection in some organizational settings (such as mine).  Someone
mentioned a Java port--that'd cover that base quite well.

As for performance, my thinking was that since hg is implemented
apparently almost entirely in Python, and has (again apparently)
generally acceptable performance, this suggested that much of the
problem might be I/O-bound enough that language efficiency might not
matter so much.

Aside: The program for which I'm considering trying git does mass spec
protein identification and has (in the general case) exponential
runtime, all of it CPU.  Run times on a 500-node cluster start at two
hours and go up rapidly.  You might think at first that this wouldn't
be a good candidate for Python, but so far this looks to be incorrect.
 The simple reason: asymptotically, all of the run time happens in
about four functions.  Given that, and friendly constants, what was
about 15K (*) lines of C++ has turned into somewhat less than 1K lines
of C++ and 1K lines of Python--it's difficult to gauge because so many
new features have been added.  Somewhat ironically, the worst
performance issue seems to be C++'s obscure (to me) object
construction costs--I may end up just switching the C++ part to C.

There are many axes of design to be considered, of course, but the
moral I took away from that is that better than asking "Does this
program have to be really fast?", one should ask "How many lines of
this program could run 20x slower (than C) without significantly
affecting overall performance?"  If the answer is 80%, it might be
worth thinking about.  Skepticism is always in order, of course.

Mike

(*) via David Wheeler's sloccount

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  3:38   ` Mike Coleman
@ 2007-01-31  4:35     ` Linus Torvalds
  2007-01-31  4:57       ` Junio C Hamano
  2007-01-31  7:11       ` Mike Coleman
  2007-01-31 15:03     ` Nicolas Pitre
  1 sibling, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-01-31  4:35 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

On Tue, 30 Jan 2007, Mike Coleman wrote:
> 
> As for performance, my thinking was that since hg is implemented
> apparently almost entirely in Python, and has (again apparently)
> generally acceptable performance, this suggested that much of the
> problem might be I/O-bound enough that language efficiency might not
> matter so much.

Note that git actually implements a lot more than hg does.

hg depends on external programs (almost uniformly written in C) to do the 
actual diff generation, 3-way merging etc. 

Git actually ends up doing all of those internally, and minimizes external 
dependencies that way. More importantly, perhaps, it allows us to do a 
better job, faster. The early example of this is patch application, where 
git supports a much nicer patch format that can express renames etc in the 
patch.

But I'll admit - my main reason going with C is (a) it's what I know and 
(b) I absolutely _hate_ being constrained by the language. The great thing 
about C (still) is that you can do *anything* in it. You're literally 
limited by hardware, and by your own abilities. Nothing else.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  4:35     ` Linus Torvalds
@ 2007-01-31  4:57       ` Junio C Hamano
  2007-01-31 16:22         ` Linus Torvalds
  2007-01-31  7:11       ` Mike Coleman
  1 sibling, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2007-01-31  4:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But I'll admit - my main reason going with C is (a) it's what I know and 
> (b) I absolutely _hate_ being constrained by the language. The great thing 
> about C (still) is that you can do *anything* in it. You're literally 
> limited by hardware, and by your own abilities. Nothing else.
>
> 			Linus

Well, if you count "time" as part of your own ability then that
is true.  Some things are too cumbersome and not performance
critical enough to do in C.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  4:35     ` Linus Torvalds
  2007-01-31  4:57       ` Junio C Hamano
@ 2007-01-31  7:11       ` Mike Coleman
  1 sibling, 0 replies; 61+ messages in thread
From: Mike Coleman @ 2007-01-31  7:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 1/30/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> But I'll admit - my main reason going with C is (a) it's what I know and
> (b) I absolutely _hate_ being constrained by the language. The great thing
> about C (still) is that you can do *anything* in it. You're literally
> limited by hardware, and by your own abilities. Nothing else.

I have a lot of sympathy for this way of thinking, and maybe even a
little envy of people who've found worthy things to do at this level.

But I'm also very lazy (and probably not that talented).  So, I count
myself lucky that I've found a niche in the age of Python/Ruby/etc and
Moore's Law.  I turn programs around quickly, telling users "come back
if you need it to go faster", at which point I would profile and drop
to C, or even assembler.  But they never do--they just want the next
program.

Mike

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  1:55   ` Theodore Tso
@ 2007-01-31 10:56     ` Jakub Narebski
  2007-01-31 20:01       ` Junio C Hamano
  2007-01-31 22:25       ` Matt Mackall
  0 siblings, 2 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-01-31 10:56 UTC (permalink / raw)
  To: git; +Cc: mercurial

Theodore Tso wrote:

> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
>> I think hg modifies files as it goes, which could cause some issues
>> when a writer is aborted.  I'm sure they have thought about the
>> problem and tried to make it safe, but there isn't anything safer
>> than just leaving the damn thing alone.  :)
> 
> To be fair hg modifies files using O_APPEND only.  That isn't quite as
> safe as "only creating new files", but it is relatively safe.

>From (libc.info):

 -- Macro: int O_APPEND
     The bit that enables append mode for the file.  If set, then all
     `write' operations write the data at the end of the file, extending
     it, regardless of the current file position.  This is the only
     reliable way to append to a file.  In append mode, you are
     guaranteed that the data you write will always go to the current
     end of the file, regardless of other processes writing to the
     file.  Conversely, if you simply set the file position to the end
     of file and write, then another process can extend the file after
     you set the file position but before you write, resulting in your
     data appearing someplace before the real end of file.

I don't quote understand how that would help hg (Mercurial) to have
operations like commit, pull/fetch or push atomic, i.e. all or nothing.
In hg you have to update individual files (blobs buckets) storing delta
and perhaps full version, update manifest file (flat tree) and update
changelog (commit): what happens if for example there are two concurrent
operations trying to update repository, e.g. two push operations in parallel
(from two different developers), or fetch from cron and commit? What
happens if operation is interrupted (e.g. lost connection to network during
fetch)?

In git both situations result in some prune-able and fsck-visible crud in
repository, but repository stays uncorrupted, and all operations are atomic
(all or nothing).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  3:38   ` Mike Coleman
  2007-01-31  4:35     ` Linus Torvalds
@ 2007-01-31 15:03     ` Nicolas Pitre
  2007-01-31 16:58       ` Mike Coleman
  1 sibling, 1 reply; 61+ messages in thread
From: Nicolas Pitre @ 2007-01-31 15:03 UTC (permalink / raw)
  To: Mike Coleman; +Cc: git

On Tue, 30 Jan 2007, Mike Coleman wrote:

> As for performance, my thinking was that since hg is implemented
> apparently almost entirely in Python, and has (again apparently)
> generally acceptable performance, this suggested that much of the
> problem might be I/O-bound enough that language efficiency might not
> matter so much.

Matt Mackall said himself that some core portion of hg have been 
rewritten in C in order to improve performances.


Nicolas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31  4:57       ` Junio C Hamano
@ 2007-01-31 16:22         ` Linus Torvalds
  2007-01-31 16:41           ` Johannes Schindelin
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-01-31 16:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 30 Jan 2007, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > 
> > But I'll admit - my main reason going with C is (a) it's what I know and 
> > (b) I absolutely _hate_ being constrained by the language. The great thing 
> > about C (still) is that you can do *anything* in it. You're literally 
> > limited by hardware, and by your own abilities. Nothing else.
> 
> Well, if you count "time" as part of your own ability then that
> is true.  Some things are too cumbersome and not performance
> critical enough to do in C.

Sure. I'd probably not do some graphical front-end in C - although some of 
the toolkits make that resonable too. 

But even for "time", C actually does have a number of big advantages that 
some people often seem to overlook:

 - it has absolutely tons of infrastructure. Something like Perl comes 
   *close*, but in the end, even the Perl CPAN stuff is just a drop in the 
   bucket for what somebody programming in C has. Other scripting 
   languages? Outside of their specific things (ie the tcl/tk kind of 
   thing), they really have nothing.

 - perhaps even more importantly: there's a ton of clueful people who know 
   it. Maybe this stems from my personal blinders on what "competent" is 
   (and from just my sheltered life in general), but absolutely everybody 
   who is deeply competent will know C. Not everybody will want to program 
   in it, but they *all* know enough to be able to work with it.

The latter one is rather relevant for open source programming. Finding 
some really competent person who has written a library to do (say, purely 
hypothetically - NOT!) a clean and efficient "diff" implementation can be 
a huge deal. And you will find that using C. 

So yeah, C is low-level. Yeah, you have to know how "pointers" work. And 
yeah, it takes effort especially to get started. But once you have gotten 
started, you realize that:

 - it may have been a lot more work to get over the hump, but once you 
   did, you can find people who can work with you and help you.

 - yeah, you didn't really want to work with people who didn't know how a 
   "pointer to a function returning a const pointer" really works.

I agree that C is a really hard language for "prototyping". And yes, I'll 
also agree that probably 95% of all programming is really about 
prototyping. Make something that works, and move on. In that environment, 
C is simply wrong.

But in a real core infrastructure environment, I'd say that almost 
anything *but* C (or "fairly similar" language) tends to be a mistake.

So I personally tend to always work on that infrastructure thing, which is 
why I love C. If it's not "core enough" that C is the proper language, I'm 
probably simply not interested.

And yeah, it will change. I realize that. My bet is that C will remain as 
the default "system language" for at least another decade. 

Of course, is an SCM "core enough"? Some parts definitely are. The actual 
low-level diff generation fairly obviously is. Is revision walking? 
Per-file operations? hg and git disagree about that decision.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 16:22         ` Linus Torvalds
@ 2007-01-31 16:41           ` Johannes Schindelin
  0 siblings, 0 replies; 61+ messages in thread
From: Johannes Schindelin @ 2007-01-31 16:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Wed, 31 Jan 2007, Linus Torvalds wrote:

> So yeah, C is low-level. Yeah, you have to know how "pointers" work. And 
> yeah, it takes effort especially to get started. But once you have 
> gotten started, you realize that:
> 
>  - it may have been a lot more work to get over the hump, but once you 
>    did, you can find people who can work with you and help you.
> 
>  - yeah, you didn't really want to work with people who didn't know how a 
>    "pointer to a function returning a const pointer" really works.

Probably related to the second point:

- you do _not_ want to work with people who a scared of pointers. Most 
  such people are only scared of it, because they are not _able_ to clean 
  up after themselves. This leads _invariably_ to bad code.

For example, I have never ever seen so bad code as in Java. If you are not 
forced by the language to clean up the data structures, you tend to get 
lazy. You don't free memory (why should I? It's garbage collected anyway, 
right?), you don't close resources, you _waste_ time by using incorrect 
data-types or doing wholesale copying all the time.

Just look at Eclipse's source code. *tries not to vomit on the keyboard*

All this is a real pity, because when you see Java code by a guy who 
learnt the ropes in C, and learnt Java properly, it is just elegant and 
concise. And it gives a huge development boost, because you have so much 
infrastructure already.

> I agree that C is a really hard language for "prototyping".

That depends. I cannot do it, I am too stupid. But I saw a guy prototyping 
in assembler, using his assembler library. That was _fast_!

Oh well, I try to stop rambling for today, and do something productive 
again.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 15:03     ` Nicolas Pitre
@ 2007-01-31 16:58       ` Mike Coleman
  0 siblings, 0 replies; 61+ messages in thread
From: Mike Coleman @ 2007-01-31 16:58 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

On 1/31/07, Nicolas Pitre <nico@cam.org> wrote:
> Matt Mackall said himself that some core portion of hg have been
> rewritten in C in order to improve performances.

For what it's worth, sloccount reports the following:

git: 52K C, 17K Perl, 10K sh, 6K Tcl, 300 Python

hg: 14K Python, 700 C

Speculating wildly, I'd be surprised if the C part of git couldn't be
reduced below 5K, at a cost of an 8K increase in Perl (or Python), and
not more than a doubling of runtime.  (This speculation is for
entertainment purposes only--I'm not suggesting a course of action.)

Mike

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 10:56     ` Jakub Narebski
@ 2007-01-31 20:01       ` Junio C Hamano
  2007-01-31 22:25       ` Matt Mackall
  1 sibling, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2007-01-31 20:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, mercurial

Jakub Narebski <jnareb@gmail.com> writes:

> Theodore Tso wrote:
>> 
>> To be fair hg modifies files using O_APPEND only.  That isn't quite as
>> safe as "only creating new files", but it is relatively safe.
>
> From (libc.info):
>  -- Macro: int O_APPEND
>  ...
> I don't quote understand how that would help hg (Mercurial) to have
> operations like commit, pull/fetch or push atomic, i.e. all or nothing.

If I remember correctly, thanks to their log-like file format,
they can rely on O_APPEND to do the right thing when growing,
and aborting the current transaction is just a truncate away (or
a set of truncates on the files appended in the transaction, if
hg touches more than one log-like file but I do not know if hg
uses only one file or more than one).  That's one of the things
I found clean and beautiful (from theoretical point of view, at
least) in their design.  I do not think O_APPEND is not used to
control concurrent operations.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 10:56     ` Jakub Narebski
  2007-01-31 20:01       ` Junio C Hamano
@ 2007-01-31 22:25       ` Matt Mackall
  2007-01-31 23:58         ` Jakub Narebski
  1 sibling, 1 reply; 61+ messages in thread
From: Matt Mackall @ 2007-01-31 22:25 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: mercurial, git

On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
> Theodore Tso wrote:
> 
> > On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
> >> I think hg modifies files as it goes, which could cause some issues
> >> when a writer is aborted.  I'm sure they have thought about the
> >> problem and tried to make it safe, but there isn't anything safer
> >> than just leaving the damn thing alone.  :)
> > 
> > To be fair hg modifies files using O_APPEND only.  That isn't quite as
> > safe as "only creating new files", but it is relatively safe.
> 
> >From (libc.info):
> 
>  -- Macro: int O_APPEND
>      The bit that enables append mode for the file.  If set, then all
>      `write' operations write the data at the end of the file, extending
>      it, regardless of the current file position.  This is the only
>      reliable way to append to a file.  In append mode, you are
>      guaranteed that the data you write will always go to the current
>      end of the file, regardless of other processes writing to the
>      file.  Conversely, if you simply set the file position to the end
>      of file and write, then another process can extend the file after
>      you set the file position but before you write, resulting in your
>      data appearing someplace before the real end of file.
> 
> I don't quote understand how that would help hg (Mercurial) to have
> operations like commit, pull/fetch or push atomic, i.e. all or nothing.

That's because it's unrelated.

> In hg you have to update individual files (blobs buckets) storing delta
> and perhaps full version, update manifest file (flat tree) and update
> changelog (commit): what happens if for example there are two concurrent
> operations trying to update repository, e.g. two push operations in parallel
> (from two different developers), or fetch from cron and commit?

Mercurial has write-side locks so there can only ever be one writer at
a time. There are no locks needed on the read side, so there can be
any number of readers, even while commits are happening.

> What happens if operation is interrupted (e.g. lost connection to
> network during fetch)?

We keep a simple transaction journal. As Mercurial revlogs are
append-only, rolling back a transaction just means truncating all
files in a transaction to their original length.

> In git both situations result in some prune-able and fsck-visible crud in
> repository, but repository stays uncorrupted, and all operations are atomic
> (all or nothing).

If a Mercurial transaction is interrupted and not rolled back, the
result is prune-able and fsck-visible crud. But this doesn't happen
much in practice.

The claim that's been made is that a) truncate is unsafe because Linux
has historically had problems in this area and b) git is safer because
it doesn't do this sort of thing. 

My response is a) those problems are overstated and Linux has never
had difficulty with the sorts of straightforward single writer
operations Mercurial uses and b) normal git usage involves regular
rewrites of data with packing operations that makes its exposure to
filesystem bugs equivalent or greater.

In either case, both provide strong integrity checks with recursive
SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
"back-up"!) so this is largely a non-issue relative to traditional
systems.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 22:25       ` Matt Mackall
@ 2007-01-31 23:58         ` Jakub Narebski
  2007-02-01  0:34           ` Matt Mackall
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-01-31 23:58 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

Matt Mackall wrote:
> On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
>> Theodore Tso wrote:
>> 
>>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
>>>> I think hg modifies files as it goes, which could cause some issues
>>>> when a writer is aborted.  I'm sure they have thought about the
>>>> problem and tried to make it safe, but there isn't anything safer
>>>> than just leaving the damn thing alone.  :)
>>> 
>>> To be fair hg modifies files using O_APPEND only.  That isn't quite
>>> as safe as "only creating new files", but it is relatively safe.
>> 
>>>From (libc.info):
>> 
>>  -- Macro: int O_APPEND
[...] 
>> I don't quote understand how that would help hg (Mercurial) to have
>> operations like commit, pull/fetch or push atomic, i.e. all or
>> nothing. 
> 
> That's because it's unrelated.
[...]
> Mercurial has write-side locks so there can only ever be one writer at
> a time. There are no locks needed on the read side, so there can be
> any number of readers, even while commits are happening.
> 
>> What happens if operation is interrupted (e.g. lost connection to
>> network during fetch)?
> 
> We keep a simple transaction journal. As Mercurial revlogs are
> append-only, rolling back a transaction just means truncating all
> files in a transaction to their original length.

Thanks a lot for complete answer. So Mercurial uses write-side locks
for dealing with concurrent operations, and transaction journal for
dealing with interrupted operations. I guess that incomplete transactions
are rolled back on next hg command...

I guess (please correct me if I'm wrong) that git uses "put reference
after putting data" scheme, and write-side lock in few places when it
is needed.

>> In git both situations result in some prune-able and fsck-visible crud in
>> repository, but repository stays uncorrupted, and all operations are atomic
>> (all or nothing).
> 
> If a Mercurial transaction is interrupted and not rolled back, the
> result is prune-able and fsck-visible crud. But this doesn't happen
> much in practice.
> 
> The claim that's been made is that a) truncate is unsafe because Linux
> has historically had problems in this area and b) git is safer because
> it doesn't do this sort of thing. 
> 
> My response is a) those problems are overstated and Linux has never
> had difficulty with the sorts of straightforward single writer
> operations Mercurial uses and b) normal git usage involves regular
> rewrites of data with packing operations that makes its exposure to
> filesystem bugs equivalent or greater.

Rewrites in git perhaps are (or should be) regular, but need not be often.
And with new idea/feature of kept packs rewrite need not be of full data.

One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
if it could be made safe. But not doing prune affects only a bit
repository size (where git is best I think of all SCMs) and not performance.

On the other hand hg repository structure (namely log like append changelog
/ revlog to store commits) makes it I think hard to have multiple persistent
branches.

Sidenote 1: it looks like git is optimized for speed of merge and checkout
(branch switching, or going to given point in history for bisect), and
probably accidentally for multi-branch repos, while Mercurial is optimized
for speed of commit and patch.

Sidenote 2: Mercurial repository structure might make it use "file-ids"
(perhaps implicitely), with all the disadvantages (different renames
on different branches) of those.

> In either case, both provide strong integrity checks with recursive
> SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> "back-up"!) so this is largely a non-issue relative to traditional
> systems.

Integrity checks can tell you that repository is corrupted, but it would
be better if it didn't get corrupted in first place.

Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
data, only delta chain store it?
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-01-31 23:58         ` Jakub Narebski
@ 2007-02-01  0:34           ` Matt Mackall
  2007-02-01  0:57             ` Jakub Narebski
  2007-02-02  9:55             ` Jakub Narebski
  0 siblings, 2 replies; 61+ messages in thread
From: Matt Mackall @ 2007-02-01  0:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: mercurial, git, Junio C Hamano

On Thu, Feb 01, 2007 at 12:58:42AM +0100, Jakub Narebski wrote:
> Matt Mackall wrote:
> > On Wed, Jan 31, 2007 at 11:56:01AM +0100, Jakub Narebski wrote:
> >> Theodore Tso wrote:
> >> 
> >>> On Tue, Jan 30, 2007 at 11:55:48AM -0500, Shawn O. Pearce wrote:
> >>>> I think hg modifies files as it goes, which could cause some issues
> >>>> when a writer is aborted.  I'm sure they have thought about the
> >>>> problem and tried to make it safe, but there isn't anything safer
> >>>> than just leaving the damn thing alone.  :)
> >>> 
> >>> To be fair hg modifies files using O_APPEND only.  That isn't quite
> >>> as safe as "only creating new files", but it is relatively safe.
> >> 
> >>>From (libc.info):
> >> 
> >>  -- Macro: int O_APPEND
> [...] 
> >> I don't quote understand how that would help hg (Mercurial) to have
> >> operations like commit, pull/fetch or push atomic, i.e. all or
> >> nothing. 
> > 
> > That's because it's unrelated.
> [...]
> > Mercurial has write-side locks so there can only ever be one writer at
> > a time. There are no locks needed on the read side, so there can be
> > any number of readers, even while commits are happening.
> > 
> >> What happens if operation is interrupted (e.g. lost connection to
> >> network during fetch)?
> > 
> > We keep a simple transaction journal. As Mercurial revlogs are
> > append-only, rolling back a transaction just means truncating all
> > files in a transaction to their original length.
> 
> Thanks a lot for complete answer. So Mercurial uses write-side locks
> for dealing with concurrent operations, and transaction journal for
> dealing with interrupted operations. I guess that incomplete transactions
> are rolled back on next hg command...

They are either automatically rolled back on abort or if that fails
for some reason like power failure the user is prompted to run "hg
recover" to complete the rollback. We also save the last transaction
journal which allows one level of undo for pulls/commits.

> I guess (please correct me if I'm wrong) that git uses "put reference
> after putting data" scheme, and write-side lock in few places when it
> is needed.

Mercurial also uses a "put reference after putting data" which is what
allows us to have no read vs write locking.
  
> >> In git both situations result in some prune-able and fsck-visible crud in
> >> repository, but repository stays uncorrupted, and all operations are atomic
> >> (all or nothing).
> > 
> > If a Mercurial transaction is interrupted and not rolled back, the
> > result is prune-able and fsck-visible crud. But this doesn't happen
> > much in practice.
> > 
> > The claim that's been made is that a) truncate is unsafe because Linux
> > has historically had problems in this area and b) git is safer because
> > it doesn't do this sort of thing. 
> > 
> > My response is a) those problems are overstated and Linux has never
> > had difficulty with the sorts of straightforward single writer
> > operations Mercurial uses and b) normal git usage involves regular
> > rewrites of data with packing operations that makes its exposure to
> > filesystem bugs equivalent or greater.
> 
> Rewrites in git perhaps are (or should be) regular, but need not be often.
> And with new idea/feature of kept packs rewrite need not be of full data.

If the set of files in a given commit (say tip) gets spread out across
an arbitrary number of packs ordered by last modification time,
performance degrades to O(n) lookups and random seeking.

> One command which _is_ (a bit) unsafe in git is git-prune. I'm not sure
> if it could be made safe. But not doing prune affects only a bit
> repository size (where git is best I think of all SCMs) and not performance.
> 
> On the other hand hg repository structure (namely log like append changelog
> / revlog to store commits) makes it I think hard to have multiple persistent
> branches.

Not sure why you think that. There are some difficulties here, but
they're mostly owing to the fact that we've always emphasized the one
branch per repo approach as being the most user-friendly.

> Sidenote 1: it looks like git is optimized for speed of merge and checkout
> (branch switching, or going to given point in history for bisect), and
> probably accidentally for multi-branch repos, while Mercurial is optimized
> for speed of commit and patch.

I think all of these things are comparable.

> Sidenote 2: Mercurial repository structure might make it use "file-ids"
> (perhaps implicitely), with all the disadvantages (different renames
> on different branches) of those.

Nope.

> > In either case, both provide strong integrity checks with recursive
> > SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
> > "back-up"!) so this is largely a non-issue relative to traditional
> > systems.
> 
> Integrity checks can tell you that repository is corrupted, but it would
> be better if it didn't get corrupted in first place.

Obviously. Hence our append-only design. Data that's written to a repo
is never rewritten, which minimizes exposure to software bugs and I/O
errors.
 
> Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
> data, only delta chain store it?

We use zlib compression of deltas and have since April 6, 2005.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01  0:34           ` Matt Mackall
@ 2007-02-01  0:57             ` Jakub Narebski
  2007-02-01  7:59               ` Simon 'corecode' Schubert
  2007-02-02  9:55             ` Jakub Narebski
  1 sibling, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-01  0:57 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

Matt Mackall wrote:
> On Thu, Feb 01, 2007 at 12:58:42AM +0100, Jakub Narebski wrote:

>> Sidenote 1: it looks like git is optimized for speed of merge and checkout
>> (branch switching, or going to given point in history for bisect), and
>> probably accidentally for multi-branch repos, while Mercurial is optimized
>> for speed of commit and patch.
> 
> I think all of these things are comparable.

Hierarchical tree objects in git optimize for speed of merge and checkout
IMVHO, as you need only to check out one hash to know if you have to
descend into subdirectory, or if given subdirectory haven't changed.
Flat manifest file in Mercurial (and also "filename buckets") makes
commits faster, I think.

>> Sidenote 2: Mercurial repository structure might make it use "file-ids"
>> (perhaps implicitely), with all the disadvantages (different renames
>> on different branches) of those.
> 
> Nope.

How it is so, if the blobs (file contents) are stored filename hashed?
IIRC hg has some scheme to deal with renames, but it is file-id (file
identity) based AFAIK.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01  0:57             ` Jakub Narebski
@ 2007-02-01  7:59               ` Simon 'corecode' Schubert
  2007-02-01 10:09                 ` Johannes Schindelin
  0 siblings, 1 reply; 61+ messages in thread
From: Simon 'corecode' Schubert @ 2007-02-01  7:59 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Matt Mackall, mercurial, git

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

Jakub Narebski wrote:
>>> Sidenote 2: Mercurial repository structure might make it use "file-ids"
>>> (perhaps implicitely), with all the disadvantages (different renames
>>> on different branches) of those.
>> Nope.
> How it is so, if the blobs (file contents) are stored filename hashed?
> IIRC hg has some scheme to deal with renames, but it is file-id (file
> identity) based AFAIK.

No, the buckets are simply the filename.  If you rename, you take the penalty of duplicating the content (compressed) with a new name.  No big deal there.  So there are *no* file-ids.  Blobs go into the data/index file which corresponds to their filename.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01  7:59               ` Simon 'corecode' Schubert
@ 2007-02-01 10:09                 ` Johannes Schindelin
  2007-02-01 10:15                   ` Simon 'corecode' Schubert
  0 siblings, 1 reply; 61+ messages in thread
From: Johannes Schindelin @ 2007-02-01 10:09 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Hi,

[culled many people from the Cc: list to avoid a flamewar]

On Thu, 1 Feb 2007, Simon 'corecode' Schubert wrote:

> If you rename, you take the penalty of duplicating the content 
> (compressed) with a new name.  No big deal there. So there are *no* 
> file-ids.  Blobs go into the data/index file which corresponds to their 
> filename.

So, can you explain to me how a filename is _not_ a file-id?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01 10:09                 ` Johannes Schindelin
@ 2007-02-01 10:15                   ` Simon 'corecode' Schubert
  2007-02-01 10:49                     ` Johannes Schindelin
  2007-02-01 16:28                     ` Linus Torvalds
  0 siblings, 2 replies; 61+ messages in thread
From: Simon 'corecode' Schubert @ 2007-02-01 10:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

Johannes Schindelin wrote:
>> If you rename, you take the penalty of duplicating the content 
>> (compressed) with a new name.  No big deal there. So there are *no* 
>> file-ids.  Blobs go into the data/index file which corresponds to their 
>> filename.
> So, can you explain to me how a filename is _not_ a file-id?

It is not a file-id like other SCM use it (I think monotone, not sure though).  If you copy/move the content to a new name, the ID will not stay the same.  Just see it as a hash bucket which allows you easy access to the history for a file currently with this name.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01 10:15                   ` Simon 'corecode' Schubert
@ 2007-02-01 10:49                     ` Johannes Schindelin
  2007-02-01 16:28                     ` Linus Torvalds
  1 sibling, 0 replies; 61+ messages in thread
From: Johannes Schindelin @ 2007-02-01 10:49 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Hi,

On Thu, 1 Feb 2007, Simon 'corecode' Schubert wrote:

> Johannes Schindelin wrote:
> > > If you rename, you take the penalty of duplicating the content
> > > (compressed) with a new name.  No big deal there. So there are *no*
> > > file-ids.  Blobs go into the data/index file which corresponds to their
> > > filename.
> > So, can you explain to me how a filename is _not_ a file-id?
> 
> It is not a file-id like other SCM use it (I think monotone, not sure though).
> If you copy/move the content to a new name, the ID will not stay the same.
> Just see it as a hash bucket which allows you easy access to the history for a
> file currently with this name.

Ah, thanks. I misunderstood the meaning of file-id in _that_ context.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01 10:15                   ` Simon 'corecode' Schubert
  2007-02-01 10:49                     ` Johannes Schindelin
@ 2007-02-01 16:28                     ` Linus Torvalds
  2007-02-01 19:36                       ` Eric Wong
  1 sibling, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-02-01 16:28 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: Johannes Schindelin, git

On Thu, 1 Feb 2007, Simon 'corecode' Schubert wrote:
>
> > So, can you explain to me how a filename is _not_ a file-id?
> 
> It is not a file-id like other SCM use it (I think monotone, not sure though).
> If you copy/move the content to a new name, the ID will not stay the same.
> Just see it as a hash bucket which allows you easy access to the history for a
> file currently with this name.

Well, that's actually just another "file ID" too. It's just not an "inode 
number" kind of file ID, it's more the "CVS file ID" kind of ID.

SVN uses "inode numbers" (I think they are just UUID's generated at "svn 
add" time, but I'm not sure) to track file ID's across renames. Some other 
SCM's do the same.

CVS uses "pathname" as the file ID (which obviously doesn't need any 
separate generation at all), which is why you have to do horrible things 
to track file ID's across renames (ie you really can't, but you *can* copy 
or move the *,v file so that your *new* "file ID" also has the same 
history as your old one).

So both of those are "file ID's" - they are what is used to index into the 
history, and they have real meaning for very fundamental operations.

You can view git as "closer" to CVS, in the sense that it certainly 
doesn't have the SVN kind of location-independent ID, and it _is_ able to 
look back in history using the path-name. So in that sense, you can 
certainly claim that the pathname is the "file ID" in git too, and that 
git is closer to CVS than to SVN.

But unlike SVN or CVS, there is no real fundamental "meaning" to the 
pathname in git. Sure, you can use the pathname to trace history of a 
file, but on the other hand, you can use a random aggregation of pathnames 
to track history of a set of files and directories, and the pathnames 
actually exist even when the file doesn't. So there obviously isn't any 
1:1 relationship, neither in usage, nor in any internal implementation.

So at least for me, "file ID" means "identifier for a particular chain of 
history". THAT exists in both CVS and SVN (it's a pathname and an "inode 
number" respectively), but does not exist in git at all.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01 16:28                     ` Linus Torvalds
@ 2007-02-01 19:36                       ` Eric Wong
  2007-02-01 21:13                         ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Wong @ 2007-02-01 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon 'corecode' Schubert, Johannes Schindelin, git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, 1 Feb 2007, Simon 'corecode' Schubert wrote:
> >
> > > So, can you explain to me how a filename is _not_ a file-id?
> > 
> > It is not a file-id like other SCM use it (I think monotone, not sure though).
> > If you copy/move the content to a new name, the ID will not stay the same.
> > Just see it as a hash bucket which allows you easy access to the history for a
> > file currently with this name.
> 
> Well, that's actually just another "file ID" too. It's just not an "inode 
> number" kind of file ID, it's more the "CVS file ID" kind of ID.
> 
> SVN uses "inode numbers" (I think they are just UUID's generated at "svn 
> add" time, but I'm not sure) to track file ID's across renames. Some other 
> SCM's do the same.

I think you got this part confused with GNU Arch (and possibly
Bzr).  SVN tracks renames in the changeset, it records (in the log)
a copy and delete.  pathname@revision is the only "file ID" I know
about in SVN.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01 19:36                       ` Eric Wong
@ 2007-02-01 21:13                         ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-02-01 21:13 UTC (permalink / raw)
  To: Eric Wong; +Cc: Simon 'corecode' Schubert, Johannes Schindelin, git



On Thu, 1 Feb 2007, Eric Wong wrote:
> > SVN uses "inode numbers" (I think they are just UUID's generated at "svn 
> > add" time, but I'm not sure) to track file ID's across renames. Some other 
> > SCM's do the same.
> 
> I think you got this part confused with GNU Arch (and possibly
> Bzr).  SVN tracks renames in the changeset, it records (in the log)
> a copy and delete.  pathname@revision is the only "file ID" I know
> about in SVN.

Ahh, I was sure the revision files in FSFS were per-file, but coor me 
corrected - they seem to be per-revision.

My bad.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-01  0:34           ` Matt Mackall
  2007-02-01  0:57             ` Jakub Narebski
@ 2007-02-02  9:55             ` Jakub Narebski
  2007-02-02 13:51               ` Simon 'corecode' Schubert
                                 ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02  9:55 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

On Thu, 01 Feb 2007 00:00:00 +0100, Matt Mackall wrote:
> On Thu, Feb 01, 2007 at 12:58:42AM +0100, Jakub Narebski wrote:
>> Matt Mackall wrote:

>> On the other hand hg repository structure (namely log like append changelog
>> / revlog to store commits) makes it I think hard to have multiple persistent
>> branches.
> 
> Not sure why you think that. There are some difficulties here, but
> they're mostly owing to the fact that we've always emphasized the one
> branch per repo approach as being the most user-friendly.

Well, perhaps I should say that append-log changelog / revlog[*1*] structure
to store commits makes it natural to have one branch per repository, as
branch (in the lineage of given commit meaning, i.e. all commits which
are ancestors of given commit) is roughly equivalent to changelog / revlog
and branch tip (latest commit on a branch) is top commit (latest entry)
in changelog / revlog.

In git, with its DAG (direct acyclic graph) of commits and branch tip as
a moving pointer (top of stack pointer like moving) to a commit in DAG
makes it natural to have multiple branches in a repository (current branch
is branch pointed by HEAD, another pointer - to branch this time[*2*]).

Perhaps multiple branch repository makes learning curve a bit steeper,
but also encourages using temporary branches and topic branches, which
makes _development_ (as opposed to using version control tool) more
(power)user-friendly; and makes SCM more powerfull.

How Mercurial solves problem of multiple _persistent_ branches? Does it
add pointers to commits somewhere deeper in changelog / revlog?

BTW does Mercurial have tags?

>>> In either case, both provide strong integrity checks with recursive
>>> SHA1 hashing, zlib CRCs, and GPG signatures (as well as distributed
>>> "back-up"!) so this is largely a non-issue relative to traditional
>>> systems.
>> 
>> Integrity checks can tell you that repository is corrupted, but it would
>> be better if it didn't get corrupted in first place.
> 
> Obviously. Hence our append-only design. Data that's written to a repo
> is never rewritten, which minimizes exposure to software bugs and I/O
> errors.

By the way, RCS / CVS rewrote relevant data (to have diff from the top
structure) on each commit.

I wonder if git could generate pack on the fly fastimport like...

>> Besides: zlib CRC for Mercurial? I thought that hg didn't compress the
>> data, only delta chain store it?
> 
> We use zlib compression of deltas and have since April 6, 2005.

Nice to know. You compress only file deltas, or also file revision
metadata? Do you compress manifests (trees) and commits (or at least
commit messages) too?

Footnotes:
----------

[*1*] I don't know what nomenclature Mercurial uses for blobs (file
contents), trees (directory contents) and commits (revision contents)
storage.

[*2*] I disregard here latest work on "detached HEAD" in git.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02  9:55             ` Jakub Narebski
@ 2007-02-02 13:51               ` Simon 'corecode' Schubert
  2007-02-02 14:23                 ` Jakub Narebski
  2007-02-02 15:38               ` Mark Wooding
  2007-02-02 16:03               ` Matt Mackall
  2 siblings, 1 reply; 61+ messages in thread
From: Simon 'corecode' Schubert @ 2007-02-02 13:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

Jakub Narebski wrote:
>>> Integrity checks can tell you that repository is corrupted, but it would
>>> be better if it didn't get corrupted in first place.
>> Obviously. Hence our append-only design. Data that's written to a repo
>> is never rewritten, which minimizes exposure to software bugs and I/O
>> errors.
> 
> By the way, RCS / CVS rewrote relevant data (to have diff from the top
> structure) on each commit.
> 
> I wonder if git could generate pack on the fly fastimport like...

What do you mean with that?  generate the pack on which occasion?  CVS import?  I do this already.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 13:51               ` Simon 'corecode' Schubert
@ 2007-02-02 14:23                 ` Jakub Narebski
  2007-02-02 15:02                   ` Shawn O. Pearce
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 14:23 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Simon 'corecode' Schubert wrote:
> Jakub Narebski wrote:
>> 
>> By the way, RCS / CVS rewrote relevant data (to have diff from the top
>> structure) on each commit.
>> 
>> I wonder if git could generate pack on the fly fastimport like...
> 
> What do you mean with that?  generate the pack on which occasion?
> CVS import?  I do this already. 

On commit.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 14:23                 ` Jakub Narebski
@ 2007-02-02 15:02                   ` Shawn O. Pearce
  0 siblings, 0 replies; 61+ messages in thread
From: Shawn O. Pearce @ 2007-02-02 15:02 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Simon 'corecode' Schubert, git

Jakub Narebski <jnareb@gmail.com> wrote:
> Simon 'corecode' Schubert wrote:
> > Jakub Narebski wrote:
> >> 
> >> By the way, RCS / CVS rewrote relevant data (to have diff from the top
> >> structure) on each commit.
> >> 
> >> I wonder if git could generate pack on the fly fastimport like...
> > 
> > What do you mean with that?  generate the pack on which occasion?
> > CVS import?  I do this already. 
> 
> On commit.

I've thought about doing this.  Except there are three independent
processes occuring during commit that generate objects:

	update-index
	write-tree
	commit-tree

and the update-index portion is also git-add, which we have now
started to encourage users to do ahead of time as often as needed,
prior to running git-commit.  Its also the one that generates the
largest set of new objects for most projects.

One problem comes that we have a rule: "don't delta an object
which is already in a pack, unless -f is given".  This is one of
the reasons `git repack -a -d -l` is so dang fast.  Its assuming
all new stuff is loose, and therefore should be delta'd, but the
old stuff which we have already delta'd is kept as-is.

Basically I've thought about doing this (after my work in gfi)
and decided its not worth the level of effort involved at this time.
So I'm not going to do it.  Someone else can try.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02  9:55             ` Jakub Narebski
  2007-02-02 13:51               ` Simon 'corecode' Schubert
@ 2007-02-02 15:38               ` Mark Wooding
  2007-02-02 16:09                 ` Jakub Narebski
  2007-02-02 16:03               ` Matt Mackall
  2 siblings, 1 reply; 61+ messages in thread
From: Mark Wooding @ 2007-02-02 15:38 UTC (permalink / raw)
  To: git

Jakub Narebski <jnareb@gmail.com> wrote:

> BTW does Mercurial have tags?

Yes.  Mercurial stores tags in text files, one per line, mapping the tag
name to a SHA1 hash of the tagged revision.  There are two files of
tags: `local' tags go into .hg/tags (or somesuch) and don't get copied
by clone; global tags go into .hgtags and do get copied (of course,
since they're part of the source tree).

If I may be opinionated for a bit: this is barking for two reasons:

  * The tags files grow by having lines added to the bottom.  Files of
    this kind are almost ideal for causing merge conflicts, and there's
    no automatic means for resolving them.  (I actually wrote a custom
    tags merger recently -- if anyone wants it, just mail me.)

  * If I visit a tag, and then decide I want to visit some other, more
    recent tag, I'm screwed because it obviously didn't exist in that
    old revision.  Tying tags to the revision history in this way is
    truly daft.

-- [mdw]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02  9:55             ` Jakub Narebski
  2007-02-02 13:51               ` Simon 'corecode' Schubert
  2007-02-02 15:38               ` Mark Wooding
@ 2007-02-02 16:03               ` Matt Mackall
  2007-02-02 17:18                 ` Jakub Narebski
  2 siblings, 1 reply; 61+ messages in thread
From: Matt Mackall @ 2007-02-02 16:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: mercurial, git, Junio C Hamano

On Fri, Feb 02, 2007 at 10:55:48AM +0100, Jakub Narebski wrote:
> How Mercurial solves problem of multiple _persistent_ branches? Does it
> add pointers to commits somewhere deeper in changelog / revlog?

Each changeset may have a branch marker.

Here's branches in use with an import of mutt's CVS history:

$ hg branches
mutt-0-94                      208:b2cc0abd8fe0
HEAD                           207:a505693b54c1
mutt-0-93                      134:d59345944030
muttintl                       1:29510de8b3fc
$ hg co HEAD
176 files updated, 0 files merged, 8 files removed, 0 files unresolved
$ hg branch
HEAD
$ hg branch devel
$ hg branch
devel
$ hg branch devel

> BTW does Mercurial have tags?

Yes. Both local and revision-controlled.

> Nice to know. You compress only file deltas, or also file revision
> metadata? Do you compress manifests (trees) and commits (or at least
> commit messages) too?

All three use the same underlying storage format, so yes.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 15:38               ` Mark Wooding
@ 2007-02-02 16:09                 ` Jakub Narebski
  2007-02-02 16:42                   ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 16:09 UTC (permalink / raw)
  To: git; +Cc: mercurial

Mark Wooding wrote:

> Jakub Narebski <jnareb@gmail.com> wrote:
> 
>> BTW does Mercurial have tags?
> 
> Yes.  Mercurial stores tags in text files, one per line, mapping the tag
> name to a SHA1 hash of the tagged revision.  There are two files of
> tags: `local' tags go into .hg/tags (or somesuch) and don't get copied
> by clone; global tags go into .hgtags and do get copied (of course,
> since they're part of the source tree).

Gaaah. Why anyone would want to have non-propagated tags?

Do I understand correctly that Mercurial doesn't have annotated tags
(this also means that it doesn't have PGP/GPG signed tags), and only
equivalent of git lightweight tags?
 
> If I may be opinionated for a bit: this is barking for two reasons:
> 
>   * The tags files grow by having lines added to the bottom.  Files of
>     this kind are almost ideal for causing merge conflicts, and there's
>     no automatic means for resolving them.  (I actually wrote a custom
>     tags merger recently -- if anyone wants it, just mail me.)

Such a merger (merge strategy) would be also useful for other log-like
files, e.g. ChangeLogs and such.

>   * If I visit a tag, and then decide I want to visit some other, more
>     recent tag, I'm screwed because it obviously didn't exist in that
>     old revision.  Tying tags to the revision history in this way is
>     truly daft.

In git tags are direct or indirect (via tag object, creating annotated
tag) pointers to points in revision history (in DAG of commits). Well,
you can tag any object, which is used for example in git.git to store
out-of-tree junio GPG key used to sign release tags.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 16:09                 ` Jakub Narebski
@ 2007-02-02 16:42                   ` Linus Torvalds
  2007-02-02 16:59                     ` Jakub Narebski
  2007-02-02 17:59                     ` Brendan Cully
  0 siblings, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 16:42 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, mercurial

On Fri, 2 Feb 2007, Jakub Narebski wrote:
> 
> Gaaah. Why anyone would want to have non-propagated tags?

That's *definitely* not the mistake.

I use private tags (and branches, for that matter) all the time. I'd be 
very upset indeed if all my tags were always pushed out when I push 
something out.

The mistake seems to be to think that tags get "versioned", and are part 
of the tree history. That's insane. It means that you can never have a tag 
to a newer tree than the one you are on.

Tags are *independent* of history. They must be. They are "outside" 
history, since the whole point of tags are to point to history.

The same is obviously true of branches. The fact that my "master" branch 
is at some point in time should *not* version my "other" branch. So 
branches - like tags - must not be "inside" the history.

> > If I may be opinionated for a bit: this is barking for two reasons:
> > 
> >   * The tags files grow by having lines added to the bottom.  Files of
> >     this kind are almost ideal for causing merge conflicts, and there's
> >     no automatic means for resolving them.  (I actually wrote a custom
> >     tags merger recently -- if anyone wants it, just mail me.)
> 
> Such a merger (merge strategy) would be also useful for other log-like
> files, e.g. ChangeLogs and such.

Yeah. I think per-file merge strategies are fine. We may not do them in 
git (nothing fundamental, it just hasn't come upas a real issue, although 
I think somebody was talking about how he ended up just using a special 
"merge" program that looked at the filename), but there is definitely 
nothing wrong with the concept.

And it solves that particular problem for tag-files, but it doesn't change 
the fact that keeping tags inside of history is insane in the first place 
(so it's not a problem that *should* be solved!)

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 16:42                   ` Linus Torvalds
@ 2007-02-02 16:59                     ` Jakub Narebski
  2007-02-02 17:11                       ` Linus Torvalds
  2007-02-02 17:59                     ` Brendan Cully
  1 sibling, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 16:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, mercurial

Linus Torvalds wrote:
> 
> On Fri, 2 Feb 2007, Jakub Narebski wrote:
>> 
>> Gaaah. Why anyone would want to have non-propagated tags?
> 
> That's *definitely* not the mistake.

Ermmm... right. Now that I thought about it a bit...

> I use private tags (and branches, for that matter) all the time. I'd be 
> very upset indeed if all my tags were always pushed out when I push 
> something out.

Well, in git you can have private tags (anything not under refs/tags
or under refs/heads is by default private), but I think you can only
have not published branches (which are not pushed to public repository).
If it is not true, then how one can have private branches 
(i.e. branches which 'push --all' would not push)?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 16:59                     ` Jakub Narebski
@ 2007-02-02 17:11                       ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 17:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, mercurial

On Fri, 2 Feb 2007, Jakub Narebski wrote:
> 
> Well, in git you can have private tags (anything not under refs/tags
> or under refs/heads is by default private), but I think you can only
> have not published branches (which are not pushed to public repository).
> If it is not true, then how one can have private branches 
> (i.e. branches which 'push --all' would not push)?

I have private branches, I just don't push them. The same thing is true of 
tags. 

Anybody who actually publishes his own git directory *directly* to pthers 
is probably insane. It's like showing your home directory. You just 
shouldn't do it. So anything in a real development archive is - by 
definition - "private". Only when you actually expose it explicitly (by 
exporting it at some public place) do things become public.

But if you tie your tags to history, you *have* to push them as you push 
the history.

So again, this is not about private vs public. The bug is not there. The 
bug is thinking that you should make tags part of your history.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 16:03               ` Matt Mackall
@ 2007-02-02 17:18                 ` Jakub Narebski
  2007-02-02 17:37                   ` Matt Mackall
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 17:18 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

On 02.02.2004, Matt Mackall wrote:
> On Fri, Feb 02, 2007 at 10:55:48AM +0100, Jakub Narebski wrote:

>> How Mercurial solves problem of multiple _persistent_ branches? Does it
>> add pointers to commits somewhere deeper in changelog / revlog?
> 
> Each changeset may have a branch marker.

By changeset you mean commit-revlog (changelog)? 

Where those branch markers are stored? Are those markers moving pointers,
meaning that if you make a commit while on branch, branch marker for
current branch will move?

Static markers cannot identify branch in the presence of branch points:

                   ---a<---b ........ side branch
                  /
  1<---2<---3<---4<---5<---6<---7 ... main branch
            ^
            :   
             ''''' tag

> Here's branches in use with an import of mutt's CVS history:
> 
> $ hg branches
> mutt-0-94                      208:b2cc0abd8fe0
> HEAD                           207:a505693b54c1
> mutt-0-93                      134:d59345944030
> muttintl                       1:29510de8b3fc

What is the first number? I understand that second is shortened (is it
stored shortened, I wonder) hash identifier of a commit...

> $ hg co HEAD
> 176 files updated, 0 files merged, 8 files removed, 0 files unresolved

Git (at least for now) writes nothing on checkout; it is planned that
it would write changes status-like; perhaps summary would be enough...
or is it only working area status that is to be written...

> $ hg branch
> HEAD
> $ hg branch devel
> $ hg branch
> devel
> $ hg branch devel
> 
>> BTW does Mercurial have tags?
> 
> Yes. Both local and revision-controlled.

Revision-controlled (in-tree) tags are inane idea. Tags are non-moving
(and sometimes annotated) pointers to given point in history. They should
not depend on which branch you are, or what version you have checked out.

Otherwise the following would not work:
 $ git reset --hard v1.0.0
 $ git reset --hard v1.4.4.4
(it could be "git checkout" instead of "git reset --hard" in 'master'
version of git, with "detached HEAD" / "anonymous branch" feature).

>> Nice to know. You compress only file deltas, or also file revision
>> metadata? Do you compress manifests (trees) and commits (or at least
>> commit messages) too?
> 
> All three use the same underlying storage format, so yes.

But do you compress metadata (like base of a delta for file deltas,
authorship of a commit and reference to manifest-log entry)? Do manifest
is delta-encoded?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 17:18                 ` Jakub Narebski
@ 2007-02-02 17:37                   ` Matt Mackall
  2007-02-02 18:44                     ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Matt Mackall @ 2007-02-02 17:37 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: mercurial, git, Junio C Hamano

On Fri, Feb 02, 2007 at 06:18:10PM +0100, Jakub Narebski wrote:
> Revision-controlled (in-tree) tags are inane idea. Tags are non-moving
> (and sometimes annotated) pointers to given point in history. They should
> not depend on which branch you are, or what version you have checked out.

And.. they don't!

I'm now officially done correcting your uninformed perceptions. Come
back when you've actually looked at the docs.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 16:42                   ` Linus Torvalds
  2007-02-02 16:59                     ` Jakub Narebski
@ 2007-02-02 17:59                     ` Brendan Cully
  2007-02-02 18:19                       ` Jakub Narebski
                                         ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Brendan Cully @ 2007-02-02 17:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mercurial, git, Jakub Narebski

On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
> 
> 
> On Fri, 2 Feb 2007, Jakub Narebski wrote:
> > 
> > Gaaah. Why anyone would want to have non-propagated tags?
> 
> That's *definitely* not the mistake.
> 
> I use private tags (and branches, for that matter) all the time. I'd be 
> very upset indeed if all my tags were always pushed out when I push 
> something out.
> 
> The mistake seems to be to think that tags get "versioned", and are part 
> of the tree history. That's insane. It means that you can never have a tag 
> to a newer tree than the one you are on.

The tags you use can simply be those from the tip of the repository,
regardless of which revision you've currently checked out.

> Tags are *independent* of history. They must be. They are "outside" 
> history, since the whole point of tags are to point to history.

Tags have history too. They are added at particular times by
particular people, and sometimes changed (this wouldn't happen in an
ideal world, but it happens). It's a shame not to be able to find this
history.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 17:59                     ` Brendan Cully
@ 2007-02-02 18:19                       ` Jakub Narebski
  2007-02-02 19:28                         ` Brendan Cully
  2007-02-02 18:27                       ` Giorgos Keramidas
  2007-02-02 18:32                       ` Linus Torvalds
  2 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 18:19 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Linus Torvalds, mercurial, git

Brendan Cully wrote:
> On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
>> 
>> The mistake seems to be to think that tags get "versioned", and are part 
>> of the tree history. That's insane. It means that you can never have a tag 
>> to a newer tree than the one you are on.
> 
> The tags you use can simply be those from the tip of the repository,
> regardless of which revision you've currently checked out.

_Can_ be or _are_ (in Mercurial)? Besides, there can be more than one
tip of repository (branch are tips of history), and making set of tags
dependent on which branch you are on is not a good idea either.

>> Tags are *independent* of history. They must be. They are "outside" 
>> history, since the whole point of tags are to point to history.
> 
> Tags have history too. They are added at particular times by
> particular people, and sometimes changed (this wouldn't happen in an
> ideal world, but it happens). It's a shame not to be able to find this
> history.

That is what reflogs are for, although you usually don't enable this
for tags (because tags are meant to be immutable, especially signed
release tags).

Besides, in git annotated tags have tagger info, i.e. who and when
created a tag.


Besides tags point to history. Having them inside history is abstraction
breakage, IMVHO.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 17:59                     ` Brendan Cully
  2007-02-02 18:19                       ` Jakub Narebski
@ 2007-02-02 18:27                       ` Giorgos Keramidas
  2007-02-02 19:01                         ` Linus Torvalds
  2007-02-02 18:32                       ` Linus Torvalds
  2 siblings, 1 reply; 61+ messages in thread
From: Giorgos Keramidas @ 2007-02-02 18:27 UTC (permalink / raw)
  To: torvalds, jnareb, mercurial, git

On 2007-02-02 09:59, Brendan Cully <brendan@kublai.com> wrote:
>On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
>> Tags are *independent* of history. They must be. They are "outside"
>> history, since the whole point of tags are to point to history.
>
> Tags have history too. They are added at particular times by
> particular people, and sometimes changed (this wouldn't happen in an
> ideal world, but it happens). It's a shame not to be able to find this
> history.

Agreed.  There is a _reason_ behind the -f option of 'cvs tag'.

Sometimes, 'sliding a tag' is a real-world need.  Losing the information
of who did the tag sliding and when, is not good.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 17:59                     ` Brendan Cully
  2007-02-02 18:19                       ` Jakub Narebski
  2007-02-02 18:27                       ` Giorgos Keramidas
@ 2007-02-02 18:32                       ` Linus Torvalds
  2007-02-02 19:26                         ` Brendan Cully
  2 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 18:32 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Jakub Narebski, mercurial, git



On Fri, 2 Feb 2007, Brendan Cully wrote:

> On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
> > 
> > 
> > On Fri, 2 Feb 2007, Jakub Narebski wrote:
> > > 
> > > Gaaah. Why anyone would want to have non-propagated tags?
> > 
> > That's *definitely* not the mistake.
> > 
> > I use private tags (and branches, for that matter) all the time. I'd be 
> > very upset indeed if all my tags were always pushed out when I push 
> > something out.
> > 
> > The mistake seems to be to think that tags get "versioned", and are part 
> > of the tree history. That's insane. It means that you can never have a tag 
> > to a newer tree than the one you are on.
> 
> The tags you use can simply be those from the tip of the repository,
> regardless of which revision you've currently checked out.

Did you not understand the problem?

If I want to push out my history, that does NOT mean that I don't want to 
push out my tags. At least not to the public sites. I migth want to push 
them out to my other *private* copies, though.

In other words, tags are just like branches. You don't tie two tags 
together, because one may (and does) make sense without the other.

Tying tags into history is silly. They're not "part of" history. They are 
pointers *to* history. And trying to make them part of history has all 
these obvious problems.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 17:37                   ` Matt Mackall
@ 2007-02-02 18:44                     ` Jakub Narebski
  2007-02-02 19:56                       ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 18:44 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

Matt Mackall wrote:
> On Fri, Feb 02, 2007 at 06:18:10PM +0100, Jakub Narebski wrote:

>> Revision-controlled (in-tree) tags are inane idea. Tags are non-moving
>> (and sometimes annotated) pointers to given point in history. They should
>> not depend on which branch you are, or what version you have checked out.
> 
> And.. they don't!

If that means that you always use the version of .hgtags from the tip
(branches are tips of history; they can have different .hgtags),
this is also broken; this means for example that you cannot compare
current version when on development head (branch) with tag on different
branch, those two branches have the same .hgtags file.

"They should not depend on which branch you are"... and they can.

> I'm now officially done correcting your uninformed perceptions. Come
> back when you've actually looked at the docs.

URL, pretty please?

My mistake is caused by the fact that .hgtags is special, i.e. not
current version is used (as e.g. with .scmignore files) but version
closest to the tip. This means broken abstraction.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 18:27                       ` Giorgos Keramidas
@ 2007-02-02 19:01                         ` Linus Torvalds
  2007-02-03 21:20                           ` Giorgos Keramidas
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 19:01 UTC (permalink / raw)
  To: Giorgos Keramidas; +Cc: jnareb, mercurial, git

On Fri, 2 Feb 2007, Giorgos Keramidas wrote:
> 
> Sometimes, 'sliding a tag' is a real-world need.  Losing the information
> of who did the tag sliding and when, is not good.

In practice, this is not much of an issue. 

First off, CVS tag usage is insane, but it's insane for *other* reasons 
(ie people use tags differently in CVS, but they do it not because they 
want to use tags that way, but because CVS makes it impossible to do 
anything saner).

So pointing to CVS tag usage as an argument is pointless. You might as 
well say that you shouldn't save the merge information, because CVS 
doesn't do it, and manual tags are a good way to do it. 

Secondly, the problems with tags having "history" is that you can't really 
resolve them anyway. You have to pick one. You can't "merge" them. 

In other words, tags are atomic *events*, not history. And I certainly 
agree that you shouldn't lose the events (unless you want to, of course).

I also do agree that you can absolutely have something that is basically a 
"tag that moves, and that you want to tie back to the previous state of 
the tag". In git, we just happen to call those things "branches". You 
*could* technically put one of those things into the tag-namespace if you 
want to, although it would largely be considered insane by most git users 
(and you could see it historically: each "tag" would be a merge that 
points to its previous incarnation and to the point in time that got 
tagged).

More commonly, you'd just use a "real tag", which includes the tagger 
information and a message about why something got tagged, plus possibly a 
PGP signature. That way, you can see (and save) all the individual events.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 18:32                       ` Linus Torvalds
@ 2007-02-02 19:26                         ` Brendan Cully
  2007-02-02 19:42                           ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Brendan Cully @ 2007-02-02 19:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, mercurial, git

On Friday, 02 February 2007 at 10:32, Linus Torvalds wrote:
> 
> 
> On Fri, 2 Feb 2007, Brendan Cully wrote:
> 
> > On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
> > > 
> > > 
> > > On Fri, 2 Feb 2007, Jakub Narebski wrote:
> > > > 
> > > > Gaaah. Why anyone would want to have non-propagated tags?
> > > 
> > > That's *definitely* not the mistake.
> > > 
> > > I use private tags (and branches, for that matter) all the time. I'd be 
> > > very upset indeed if all my tags were always pushed out when I push 
> > > something out.
> > > 
> > > The mistake seems to be to think that tags get "versioned", and are part 
> > > of the tree history. That's insane. It means that you can never have a tag 
> > > to a newer tree than the one you are on.
> > 
> > The tags you use can simply be those from the tip of the repository,
> > regardless of which revision you've currently checked out.
> 
> Did you not understand the problem?
> 
> If I want to push out my history, that does NOT mean that I don't want to 
> push out my tags. At least not to the public sites. I migth want to push 
> them out to my other *private* copies, though.

I don't think I do, no. (Maybe it's the double negative construction.)
Local tags don't get pushed. Tags on private branches don't get
pushed. Tags on public branches do. This business you describe, where
you push tags around completely separate from the revisions they tag,
sounds a little odd. But nothing stops you from maintaining your local
tags in their own repository, if that's what makes you happy.

> In other words, tags are just like branches. You don't tie two tags 
> together, because one may (and does) make sense without the other.

Which tags are being tied together?

> Tying tags into history is silly. They're not "part of" history. They are 
> pointers *to* history. And trying to make them part of history has all 
> these obvious problems.

It seems to me they clearly do have history.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 18:19                       ` Jakub Narebski
@ 2007-02-02 19:28                         ` Brendan Cully
  0 siblings, 0 replies; 61+ messages in thread
From: Brendan Cully @ 2007-02-02 19:28 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Linus Torvalds, mercurial, git

On Friday, 02 February 2007 at 19:19, Jakub Narebski wrote:
> Brendan Cully wrote:
> > On Friday, 02 February 2007 at 08:42, Linus Torvalds wrote:
> >> 
> >> The mistake seems to be to think that tags get "versioned", and are part 
> >> of the tree history. That's insane. It means that you can never have a tag 
> >> to a newer tree than the one you are on.
> > 
> > The tags you use can simply be those from the tip of the repository,
> > regardless of which revision you've currently checked out.
> 
> _Can_ be or _are_ (in Mercurial)? Besides, there can be more than one

are. The meaning of tags depends on the repository, not the "index".

> tip of repository (branch are tips of history), and making set of tags
> dependent on which branch you are on is not a good idea either.

agreed.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:26                         ` Brendan Cully
@ 2007-02-02 19:42                           ` Linus Torvalds
  2007-02-02 19:55                             ` Brendan Cully
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 19:42 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Jakub Narebski, mercurial, git

On Fri, 2 Feb 2007, Brendan Cully wrote:
> 
> I don't think I do, no. (Maybe it's the double negative construction.)
> Local tags don't get pushed. Tags on private branches don't get
> pushed. Tags on public branches do. This business you describe, where
> you push tags around completely separate from the revisions they tag,
> sounds a little odd. But nothing stops you from maintaining your local
> tags in their own repository, if that's what makes you happy.
> 
> > In other words, tags are just like branches. You don't tie two tags 
> > together, because one may (and does) make sense without the other.
> 
> Which tags are being tied together?

If you tie "tag" together with "history", and push out history, what 
happens?

> It seems to me they clearly do have history.

No they don't. Quite often, tags are generated outside of history, ie you 
tag something as being "known bad" long after it was done. Or you 
(hopefully) tag it with the test-information after it passed (or 
didn't) pass some debug check. Neither of which is something you'd do when 
the thing is actually committed or developed.

So tags are *events*. But if you think they are events "within" the 
history of a tree, you're missing a big issue.

My personal use of tags tends to be
 - I tag releases I make, and sign them etc.
 - when debugging (and using "git bisect" in particular), I tag things for 
   my own memory (ie if a bisection selected something that didn't 
   compile, and I have to pick another point by hand, I tag that bad one 
   temporarily for explanation - the tag shows up nicely in the graphical 
   history viewers)

The "release" tags are done as I develop, since _others_ will do 
regression tests etc later on. I don't know whether those others will add 
their own tags on top of my tag ("passed-regression-test" tag that points 
to my release-tag, which points to whatever commit I released), but it's 
really worth pointing out that that is just a small special case.

That *small* special case I wouldn't mind being part of history. But all 
the other tags should never be, since they are actually personal to 
whoever made them (even though others may well care: for example, if a 
regression run tags something as "passed", a lot of people will care: it 
doesn't mean that the tag should be entirely private!).

And because it's wrong in general to make the tags be bound to history 
(because they may or may not be relevant to others, and they may or may 
not actually happen _during_ the history), it's wrong to design the tags 
that way. Tags really are "outside" the thing, unless you live in a world 
where only the lead engineer is supposed to use tags.

I want tags to be useful for *anybody*. A total non-developer, who decides 
that he wants to test a release, should be able to tag the particular 
versions he happened to test, and it damn well shouldn't be just 
"my-tag-1023". It should allow him to write a small story about what the 
results of the tests were!

Which is how git tags are desiged. They're separate from history, but that 
doesn't make them less useful - it makes them *more* widely useful.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:42                           ` Linus Torvalds
@ 2007-02-02 19:55                             ` Brendan Cully
  2007-02-02 20:15                               ` Jakub Narebski
  2007-02-02 20:21                               ` Linus Torvalds
  0 siblings, 2 replies; 61+ messages in thread
From: Brendan Cully @ 2007-02-02 19:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, mercurial, git

On Friday, 02 February 2007 at 11:42, Linus Torvalds wrote:
> 
> 
> On Fri, 2 Feb 2007, Brendan Cully wrote:
> > 
> > I don't think I do, no. (Maybe it's the double negative construction.)
> > Local tags don't get pushed. Tags on private branches don't get
> > pushed. Tags on public branches do. This business you describe, where
> > you push tags around completely separate from the revisions they tag,
> > sounds a little odd. But nothing stops you from maintaining your local
> > tags in their own repository, if that's what makes you happy.
> > 
> > > In other words, tags are just like branches. You don't tie two tags 
> > > together, because one may (and does) make sense without the other.
> > 
> > Which tags are being tied together?
> 
> If you tie "tag" together with "history", and push out history, what 
> happens?

The public tags on the public history get pushed. This still sounds to
me like the right thing.

> > It seems to me they clearly do have history.
> 
> No they don't. Quite often, tags are generated outside of history, ie you 
> tag something as being "known bad" long after it was done. Or you 
> (hopefully) tag it with the test-information after it passed (or 
> didn't) pass some debug check. Neither of which is something you'd do when 
> the thing is actually committed or developed.
>
> So tags are *events*. But if you think they are events "within" the 
> history of a tree, you're missing a big issue.

Your distinction between "history" and "events" is unclear to
me. What's history if not a series of events?

Just because a tag is created at a different time than the revision it
tags, that doesn't mean that it is ahistorical. It's still interesting
to know what the state of the repository was when the tag was
created.

> My personal use of tags tends to be
>  - I tag releases I make, and sign them etc.
>  - when debugging (and using "git bisect" in particular), I tag things for 
>    my own memory (ie if a bisection selected something that didn't 
>    compile, and I have to pick another point by hand, I tag that bad one 
>    temporarily for explanation - the tag shows up nicely in the graphical 
>    history viewers)

Mercurial supports local tags too. As far as I can tell, these
unversioned tags are about equivalent to git tags. They could
certainly be used for your bisection scenario.

> I want tags to be useful for *anybody*. A total non-developer, who decides 
> that he wants to test a release, should be able to tag the particular 
> versions he happened to test, and it damn well shouldn't be just 
> "my-tag-1023". It should allow him to write a small story about what the 
> results of the tests were!
> 
> Which is how git tags are desiged. They're separate from history, but that 
> doesn't make them less useful - it makes them *more* widely useful.

Mercurial supports both, because both are useful.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 18:44                     ` Jakub Narebski
@ 2007-02-02 19:56                       ` Jakub Narebski
  2007-02-03 20:06                         ` Brendan Cully
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 19:56 UTC (permalink / raw)
  To: Matt Mackall; +Cc: mercurial, git, Junio C Hamano

Jakub Narebski wrote:
> Matt Mackall wrote:
>> On Fri, Feb 02, 2007 at 06:18:10PM +0100, Jakub Narebski wrote:
> 
>>> Revision-controlled (in-tree) tags are inane idea. Tags are non-moving
>>> (and sometimes annotated) pointers to given point in history. They should
>>> not depend on which branch you are, or what version you have checked out.
>> 
>> And.. they don't!
> 
> If that means that you always use the version of .hgtags from the tip
> (branches are tips of history; they can have different .hgtags),
> this is also broken; this means for example that you cannot compare
> current version when on development head (branch) with tag on different
> branch, those two branches have the same .hgtags file.

I meant to write:

..._unless_ those two branches have the same .hgtags file.

> "They should not depend on which branch you are"... and they can.

For example you are on branch 'master', you tag current release
e.g. v1.3.4, then you checkout branch 'devel'... and you don't have
v1.3.4 tag available unless you merge in .hgtags from 'master'.
At least from what I understand of Mercurial tags behaviour.

Having to create a commit to remember tag which can be published...
I'm not sure if it is a good idea either. Junio creates "GIT 1.4.4.3"
commits, ant those are tagges, so perhaps it is not so bad idea
either.

You encourage to hand-edit .hgtags, but the edited version might
not be the one that is used (for example when starting a branch).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:55                             ` Brendan Cully
@ 2007-02-02 20:15                               ` Jakub Narebski
  2007-02-02 20:21                               ` Linus Torvalds
  1 sibling, 0 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-02-02 20:15 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Linus Torvalds, mercurial, git

Brendan Cully wrote:

> Just because a tag is created at a different time than the revision it
> tags, that doesn't mean that it is ahistorical. It's still interesting
> to know what the state of the repository was when the tag was
> created.

I think not. Why would one want to know what was the state of the 
repository when the tag (perhaps to some historical, old commit)
was created?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:55                             ` Brendan Cully
  2007-02-02 20:15                               ` Jakub Narebski
@ 2007-02-02 20:21                               ` Linus Torvalds
  1 sibling, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-02-02 20:21 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Jakub Narebski, mercurial, git

On Fri, 2 Feb 2007, Brendan Cully wrote:
> 
> The public tags on the public history get pushed. This still sounds to
> me like the right thing.

And what happens if you have two public tags?

Ok, now you have answered the "how is this tying them together" question 
yourself.

Notice? You tied them together, by tying them to something else.

And that is what I say is WRONG. Tags are independent, because different 
people have different needs for them.

Yes, you can say that the tags that the "main developer" does are "magic", 
and should always follow the history. But EVEN THAT is wrong. Because it 
makes a supposition that should not be one. You shouldn't start out with 
the assumption that there is one centralized place that makes the 
decisions. It may be how most open source projects work, but it's not what 
the tool should enforce. Especially when projects break/fork/whatever, the 
tool should _support_ that. It shouldn't say "ok, those people are 
special, what they do is special".

> > So tags are *events*. But if you think they are events "within" the 
> > history of a tree, you're missing a big issue.
> 
> Your distinction between "history" and "events" is unclear to
> me. What's history if not a series of events?

A lot of events are *independent*.

You seem to think that there is a dependency between tags that simply 
isn't there! You think it's ok to push them all out, because they are all 
"related". THAT IS NOT TRUE.

> Mercurial supports local tags too. As far as I can tell, these
> unversioned tags are about equivalent to git tags. They could
> certainly be used for your bisection scenario.

Can you push out your local tags? 

A tag isn't "globally local" or "globally global". *MY* local tags make 
sense on my machines. It's just that they don't make sense on the public 
tree. They're not "local to a repository". They are LOCAL TO MY NETWORK.

See? That's the kind of behaviour that git supports. You can publish one 
set of tags, and not publish another. They're not "different tags", and 
not publishing them in one place does NOT mean that you can't publish them 
somewhere else.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:56                       ` Jakub Narebski
@ 2007-02-03 20:06                         ` Brendan Cully
  2007-02-03 20:55                           ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Brendan Cully @ 2007-02-03 20:06 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Matt Mackall, Junio C Hamano, mercurial, git

On Friday, 02 February 2007 at 20:56, Jakub Narebski wrote:
> For example you are on branch 'master', you tag current release
> e.g. v1.3.4, then you checkout branch 'devel'... and you don't have
> v1.3.4 tag available unless you merge in .hgtags from 'master'.
> At least from what I understand of Mercurial tags behaviour.

This would be bad, if it were true.

$ hg up devel
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cat .hgtags
6acda9aa5d8c621b3db2f2daab878d8de726d227 base
$ hg tags
tip                                4:b1f003583d8e
v1.3.4                             2:87e43e86318f
base                               0:6acda9aa5d8c

As mentioned before, hg has local tags which sound an awful lot like
git tags. It also has properly versioned tags. And, by the way, if you
push a branch, you only push the tags that were committed on that
branch. Furthermore, you can push based on a tag name that isn't
committed in the branch you're pushing. I think the "globally global"
nonsense elsewhere in this thread may be a result of not understanding
this.

I'm probably done with this thread too. There's too much ignorant
speculation to make it very productive.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-03 20:06                         ` Brendan Cully
@ 2007-02-03 20:55                           ` Jakub Narebski
  2007-02-03 21:00                             ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-02-03 20:55 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Matt Mackall, Junio C Hamano, mercurial, git

On 03.02.2007, Brendan Cully <brendan@kublai.com> wrote:
> On Friday, 02 February 2007 at 20:56, Jakub Narebski wrote:

>> For example you are on branch 'master', you tag current release
>> e.g. v1.3.4, then you checkout branch 'devel'... and you don't have
>> v1.3.4 tag available unless you merge in .hgtags from 'master'.
>> At least from what I understand of Mercurial tags behaviour.
> 
> This would be bad, if it were true.
> 
> $ hg up devel
> 2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> $ cat .hgtags
> 6acda9aa5d8c621b3db2f2daab878d8de726d227 base
> $ hg tags
> tip                                4:b1f003583d8e
> v1.3.4                             2:87e43e86318f
> base                               0:6acda9aa5d8c

The above sequence of commands is not enough to reproduce the situation
I want to talk about, namely situation (repository structure) as in 
below:

                    /-\ 
   1---a---2---3---T---t---b   .... 'master' branch
        \ 
         \-2'--3'--c           .... 'devel' branch

where 'a' is branching point (merge base) of 'master' and 'devel' 
branches, 'T' is tagged changeset (revision, commit), 't' is commit
where .hgtags with 'T' tag was committed. Changesets (revisions)
'b' and 'c' are tips of 'master' and 'devel' branch, respectively.

If .hgtags was an ordinary file, then at revision marked in above
graph as '2' it wouldn't have tag 'T'.  Documentation (Mercurial
HOWTO to be more exact) tells that hg uses .hgtags version from the
tip.  But when we are at branch 'devel', the version from the tip
is version 'c' without 'T', not version 'b' with 'T'... if .hgtags
would behave as described in documentation.

It looks however (if what you say above is true also for the situation 
as in above graph, i.e. when at 'devel' branch we have 'T' in .hgtags)
that Mercurial always uses _latest_ version of .hgtags file (as in 
external wall time, having notihing to do with the history as 
represented in repository). But then we cannot say that we can merge
.hgtags file, so it is probably not the case. It is also contrary to 
what I gathered from documentation.

If above was true, i.e. .hgtags doesn't behave at all as normal file in 
working area, then what the heck it is doing there, and not somewhere 
under .hgtags!?!

> As mentioned before, hg has local tags which sound an awful lot like
> git tags. 

Git tags can be propagated. hg local tags cannot be propagated. hg tags 
"in history" always are propagated.

> It also has properly versioned tags.

Reusing in-tree version control to version tags is IMVHO not a good 
idea. Git has reflogs if you truly need to have history of tags.

> And, by the way, if you  
> push a branch, you only push the tags that were committed on that
> branch. Furthermore, you can push based on a tag name that isn't
> committed in the branch you're pushing. 

It seems awfully complicated.

> I think the "globally global" 
> nonsense elsewhere in this thread may be a result of not understanding
> this.
> 
> I'm probably done with this thread too. There's too much ignorant
> speculation to make it very productive.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-03 20:55                           ` Jakub Narebski
@ 2007-02-03 21:00                             ` Jakub Narebski
  0 siblings, 0 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-02-03 21:00 UTC (permalink / raw)
  To: Brendan Cully; +Cc: Matt Mackall, Junio C Hamano, mercurial, git

Jakub Narebski wrote:
[...] 
> If above was true, i.e. .hgtags doesn't behave at all as normal file in 
> working area, then what the heck it is doing there, and not somewhere 
> under .hgtags!?!

I meant: not somewhere under .hg/ (in repository, and not in working area,
if it does not behave as an ordinary working area file)
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-02 19:01                         ` Linus Torvalds
@ 2007-02-03 21:20                           ` Giorgos Keramidas
  2007-02-03 21:37                             ` Matthias Kestenholz
                                               ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Giorgos Keramidas @ 2007-02-03 21:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jnareb, mercurial, git

On 2007-02-02 11:01, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Fri, 2 Feb 2007, Giorgos Keramidas wrote:
>> Sometimes, 'sliding a tag' is a real-world need.  Losing the
>> information of who did the tag sliding and when, is not good.
>
> In practice, this is not much of an issue.

Sure it is.  Maybe not in the context of all projects or all teams, but
properly versioning tag names and knowing who installed the tag, and
when is quite often an issue with unversioned tags in some of the teams
I have worked with.

> First off, CVS tag usage is insane, but it's insane for *other*
> reasons (ie people use tags differently in CVS, but they do it not
> because they want to use tags that way, but because CVS makes it
> impossible to do anything saner).
>
> So pointing to CVS tag usage as an argument is pointless. You might as
> well say that you shouldn't save the merge information, because CVS
> doesn't do it, and manual tags are a good way to do it.
>
> Secondly, the problems with tags having "history" is that you can't
> really resolve them anyway. You have to pick one. You can't "merge"
> them.

Ok, maybe CVS was not so good as an example of why versioned tags *are*
useful, but my comment came from the experience I have with the tagging
of FreeBSD release builds.  The -STABLE branch os FreeBSD may be tagged
with RELENG_X_Y_Z_RELEASE at a particular point in time.  If we find
that some important bug fix has to go in, the fix is committed, and the
tag can 'slide' forward for only a particular file or set of files.

When tags are versioned, this operation is properly versioned too.  It's
apparent from browsing the global tag history that the specific tag
*was* moved forward; it's obvious where it was pointing before the
'slide' operation; it's obvious which files the 'slide' affected, etc.

Having the tags operations as an integral part of the visible history of
all public repositories is not necessarily useful for 100% of the
people who may skim through the logs, but I'm not sure why you suggest
that it's difficult to "merge" tags.

Since tags point to a very specific changeset hash, the hash serves as a
unique, unconflicting identifier of the tag's location in history.  When
a "pull" operation happens, there are no conflicts unless there is a
naming conflict between the "remote" and "local" repository.  It's not
impossible or even difficult to "merge" the two tag sets.

> In other words, tags are atomic *events*, not history. And I certainly
> agree that you shouldn't lose the events (unless you want to, of course).

Tags are a little of two different things:

(1) They are 'events' in the sense that someone has placed them to a
tree, and this operation is a very real, very natural event, and *this*
event should be versioned.

(2) They are 'pointers' to a particular changeset id.  The particular
changeset hash to which they point, when the user looks at a specific
revision of the history tree, is immuttable from the point of view of
someone looking at this specific revision.  It may *change* as the
viewer moves back and forth into the history tree though.

The pointer-nature of tags doesn't need to be versioned when one looks
at one particular changeset, but the event-nature of their placement
into the tree *can* be versioned and IMHO it *should* be versioned.
Otherwise, there is no good way to provide accountability for these
events, and some part of the repository 'history' is lost.

> I also do agree that you can absolutely have something that is
> basically a "tag that moves, and that you want to tie back to the
> previous state of the tag". In git, we just happen to call those
> things "branches".

You're confusing a single, one-time movement of a tag (to point to a
place *after* a bugfix, for instance), with the creation of a new,
entirely separate, full branch.  One of them is ok in some cases; the
other is probably necessary in others.

I can understand why they don't _both_ seem useful for all possible
cases, but I don't see why we should limit ourselves to only one of the
two options.

- Giorgos

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-03 21:20                           ` Giorgos Keramidas
@ 2007-02-03 21:37                             ` Matthias Kestenholz
  2007-02-03 21:41                             ` Linus Torvalds
  2007-02-03 21:45                             ` Jakub Narebski
  2 siblings, 0 replies; 61+ messages in thread
From: Matthias Kestenholz @ 2007-02-03 21:37 UTC (permalink / raw)
  To: Giorgos Keramidas; +Cc: Linus Torvalds, jnareb, mercurial, git

On Sat, 2007-02-03 at 23:20 +0200, Giorgos Keramidas wrote:
> Ok, maybe CVS was not so good as an example of why versioned tags *are*
> useful, but my comment came from the experience I have with the tagging
> of FreeBSD release builds.  The -STABLE branch os FreeBSD may be tagged
> with RELENG_X_Y_Z_RELEASE at a particular point in time.  If we find
> that some important bug fix has to go in, the fix is committed, and the
> tag can 'slide' forward for only a particular file or set of files.
> 

This operation is seriously broken. The Z number should be incremented,
the tag should continue pointing at the (now known to be) broken
version. That's exactly what the patchlevel number is for.

If users want to know if there are important security updates around,
they will look at the version number. If you change the revision which
the tag points to you will _seriously_ confuse users, a lot more than a
long list of patchlevel versions will ever do.

Matthias

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-03 21:20                           ` Giorgos Keramidas
  2007-02-03 21:37                             ` Matthias Kestenholz
@ 2007-02-03 21:41                             ` Linus Torvalds
  2007-02-03 21:45                             ` Jakub Narebski
  2 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-02-03 21:41 UTC (permalink / raw)
  To: Giorgos Keramidas; +Cc: jnareb, mercurial, git

On Sat, 3 Feb 2007, Giorgos Keramidas wrote:
> 
> Sure it is.  Maybe not in the context of all projects or all teams, but
> properly versioning tag names and knowing who installed the tag, and
> when is quite often an issue with unversioned tags in some of the teams
> I have worked with.

I don't understand why you argue. Everybody agrees. This is not what I've 
been arguing against.

Git tags too (unless you are lazyor just don't _want_ to version them) are 
versioned. It's the whole reason why git has a whole separate "tag space". 
Not only that, but they are evencryptographically signed (again, this is 
not forced on you, but it's part of standard practice in git projects) 
with an author key, so that the tag actually says a lot more than just the 
version - it also gives you authenticity guarantees.

In the gitk history viewer, when you click on the tag, it will show you 
that. It will show you who tagged it, and when, and if two people tag with 
the same tag-name (or the same person renames a tag), you can use that to 
see which one you have.

So nobody disputes at all that it's good to see that kind of detail.

What I claim is simple: tags are independent of history. The fact that you 
have the history, doesn't necessarily mean that you should have the tag. 
Because some tags make sense for some people.

And it's *not* about being private to a repository. The relevance simply 
isn't a black-and-white "one repo or all repos" kind of choice. For 
example, you might have private tags within a company, and not choose to 
export those outside of the company. 

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: newbie questions about git design and features (some wrt hg)
  2007-02-03 21:20                           ` Giorgos Keramidas
  2007-02-03 21:37                             ` Matthias Kestenholz
  2007-02-03 21:41                             ` Linus Torvalds
@ 2007-02-03 21:45                             ` Jakub Narebski
  2 siblings, 0 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-02-03 21:45 UTC (permalink / raw)
  To: Giorgos Keramidas; +Cc: Linus Torvalds, mercurial, git

On 03-02-2007, Giorgos Keramidas wrote:
> On 2007-02-02 11:01, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>On Fri, 2 Feb 2007, Giorgos Keramidas wrote:

>>> Sometimes, 'sliding a tag' is a real-world need.  Losing the
>>> information of who did the tag sliding and when, is not good.
>>
>> In practice, this is not much of an issue.
> 
> Sure it is.  Maybe not in the context of all projects or all teams, but
> properly versioning tag names and knowing who installed the tag, and
> when is quite often an issue with unversioned tags in some of the teams
> I have worked with.

Knowing who installed (made) a tag, and when is totally separate issue
from versioning tag names. In git first (along with tag description,
and optional PGP signing of tag) is solved by using tag objects, it
means indirect pointers: name like v1.4.0 (refs/tags/v1.4.0) refers
to tag object, which looks like below:

  object 41292ddd37202ff6dce34986c87a6000c5d3fbfa
  type commit
  tag v1.4.0
  tagger Junio C Hamano <junkio@cox.net> Sat Jun 10 12:43:37 2006 -0700
  
  GIT 1.4.0
  -----BEGIN PGP SIGNATURE-----
  Version: GnuPG v1.4.3 (GNU/Linux)
  
  iD8DBQBEiyDswMbZpPMRm5oRAr5KAJ95nnyY8x7nRVIxkV87AHux6Kdf2gCgi4xu
  NxK2qKsAkGXCil7zSFviawA=
  =Qhax
  -----END PGP SIGNATURE-----
 

Second, [local] versioning tags (if it is really needed, see discussion
below)  is solved using reflogs for tags, which look like below:

000000... fab4f1... Jakub Narebski <jnareb@gmail.com> 1163751632 +0100 \
    fetch origin git://git.kernel.org/pub/scm/git/git.git: storing tag

(it is in single line, broken here for better readibility). It is local
history; I don't thing global history of tags is needed (see below).

>> First off, CVS tag usage is insane, but it's insane for *other*
>> reasons (ie people use tags differently in CVS, but they do it not
>> because they want to use tags that way, but because CVS makes it
>> impossible to do anything saner).
>>
>> So pointing to CVS tag usage as an argument is pointless. You might as
>> well say that you shouldn't save the merge information, because CVS
>> doesn't do it, and manual tags are a good way to do it.
>>
>> Secondly, the problems with tags having "history" is that you can't
>> really resolve them anyway. You have to pick one. You can't "merge"
>> them.
> 
> Ok, maybe CVS was not so good as an example of why versioned tags *are*
> useful, but my comment came from the experience I have with the tagging
> of FreeBSD release builds.  The -STABLE branch os FreeBSD may be tagged
> with RELENG_X_Y_Z_RELEASE at a particular point in time.  If we find
> that some important bug fix has to go in, the fix is committed, and the
> tag can 'slide' forward for only a particular file or set of files.

That is a bad, bad idea. You should have v1.4.4 tag, and published tag
should be immutable, so if someone tells that there is bug in v1.4.4
you know what this version is. If you want to gather fixes for v1.4.4,
you make v1.4.4-fixes _branch_, commit fix on this branch (perhaps also
merging it into current work), and tag result v1.4.4.1 or v1.4.4-patch1.
Tags are meant to be immutable.

> When tags are versioned, this operation is properly versioned too.  It's
> apparent from browsing the global tag history that the specific tag
> *was* moved forward; it's obvious where it was pointing before the
> 'slide' operation; it's obvious which files the 'slide' affected, etc.

This operation IBHO has no sense; I understand that you can have local,
private tags you slide like bisect-low, bisect-up, before-merge,...
Public tags: no.

[...]
>> In other words, tags are atomic *events*, not history. And I certainly
>> agree that you shouldn't lose the events (unless you want to, of course).
> 
> Tags are a little of two different things:
> 
> (1) They are 'events' in the sense that someone has placed them to a
> tree, and this operation is a very real, very natural event, and *this*
> event should be versioned.

Tags should not be placed to a tree.

[...]
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2007-02-03 21:44 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-30 16:20 newbie questions about git design and features (some wrt hg) Mike Coleman
2007-01-30 16:41 ` Johannes Schindelin
2007-01-30 16:55 ` Shawn O. Pearce
2007-01-31  1:55   ` Theodore Tso
2007-01-31 10:56     ` Jakub Narebski
2007-01-31 20:01       ` Junio C Hamano
2007-01-31 22:25       ` Matt Mackall
2007-01-31 23:58         ` Jakub Narebski
2007-02-01  0:34           ` Matt Mackall
2007-02-01  0:57             ` Jakub Narebski
2007-02-01  7:59               ` Simon 'corecode' Schubert
2007-02-01 10:09                 ` Johannes Schindelin
2007-02-01 10:15                   ` Simon 'corecode' Schubert
2007-02-01 10:49                     ` Johannes Schindelin
2007-02-01 16:28                     ` Linus Torvalds
2007-02-01 19:36                       ` Eric Wong
2007-02-01 21:13                         ` Linus Torvalds
2007-02-02  9:55             ` Jakub Narebski
2007-02-02 13:51               ` Simon 'corecode' Schubert
2007-02-02 14:23                 ` Jakub Narebski
2007-02-02 15:02                   ` Shawn O. Pearce
2007-02-02 15:38               ` Mark Wooding
2007-02-02 16:09                 ` Jakub Narebski
2007-02-02 16:42                   ` Linus Torvalds
2007-02-02 16:59                     ` Jakub Narebski
2007-02-02 17:11                       ` Linus Torvalds
2007-02-02 17:59                     ` Brendan Cully
2007-02-02 18:19                       ` Jakub Narebski
2007-02-02 19:28                         ` Brendan Cully
2007-02-02 18:27                       ` Giorgos Keramidas
2007-02-02 19:01                         ` Linus Torvalds
2007-02-03 21:20                           ` Giorgos Keramidas
2007-02-03 21:37                             ` Matthias Kestenholz
2007-02-03 21:41                             ` Linus Torvalds
2007-02-03 21:45                             ` Jakub Narebski
2007-02-02 18:32                       ` Linus Torvalds
2007-02-02 19:26                         ` Brendan Cully
2007-02-02 19:42                           ` Linus Torvalds
2007-02-02 19:55                             ` Brendan Cully
2007-02-02 20:15                               ` Jakub Narebski
2007-02-02 20:21                               ` Linus Torvalds
2007-02-02 16:03               ` Matt Mackall
2007-02-02 17:18                 ` Jakub Narebski
2007-02-02 17:37                   ` Matt Mackall
2007-02-02 18:44                     ` Jakub Narebski
2007-02-02 19:56                       ` Jakub Narebski
2007-02-03 20:06                         ` Brendan Cully
2007-02-03 20:55                           ` Jakub Narebski
2007-02-03 21:00                             ` Jakub Narebski
2007-01-30 17:44 ` Jakub Narebski
2007-01-30 18:06 ` Linus Torvalds
2007-01-30 19:37   ` Linus Torvalds
2007-01-30 18:11 ` Junio C Hamano
2007-01-31  3:38   ` Mike Coleman
2007-01-31  4:35     ` Linus Torvalds
2007-01-31  4:57       ` Junio C Hamano
2007-01-31 16:22         ` Linus Torvalds
2007-01-31 16:41           ` Johannes Schindelin
2007-01-31  7:11       ` Mike Coleman
2007-01-31 15:03     ` Nicolas Pitre
2007-01-31 16:58       ` Mike Coleman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.