git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arne Babenhauserheide <arne_bab@web.de>
To: Jakub Narebski <jnareb@gmail.com>
Cc: mercurial@selenic.com, SLONIK.AZ@gmail.com, git@vger.kernel.org
Subject: Re: [VOTE] git versus mercurial (for DragonflyBSD)
Date: Mon, 27 Oct 2008 08:50:08 +0100	[thread overview]
Message-ID: <200810270850.09696.arne_bab@web.de> (raw)
In-Reply-To: <200810270252.23392.jnareb@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7896 bytes --]

Am Montag 27 Oktober 2008 02:52:22 schrieb Jakub Narebski:
> On Mon, 27 Oct 2008, Arne Babenhauserheide wrote:
> > Am Sonntag 26 Oktober 2008 19:55:09 schrieb Jakub Narebski:
> > > I agree, and I think it is at least partially because of Git having
> > > cleaner design, even if you have to understand more terms at first.
> >
> > What do you mean by "cleaner design"?
>
> Clean _underlying_ design. Git has very nice underlying model of graph
> (DAG) of commits (revisions), and branches and tags as pointers to this
> graph.
>
> > From what I see (and in my definition of "design"), Mercurial is designed
> > as VCS with very clear and clean design, which even keeps things like
> > streaming disk access in mind.
>
> I have read description of Mercurial's repository format, and it is not
> very clear in my opinion. File changesets, bound using manifest, bound
> using changerev / changelog.

This grows very simple if you keep common filesystem layout in mind. 

inodes and datanodes (the files in the store), organized in directories which 
keep many files (manifests) bound in changesets which keep additional data. 

> Mercurial relies on transactions and O_TRUNC support, while Git relies
> on atomic write and on updating data then updating reference to data.

For most operations Mercurial just relies on appending support. 

> I don't quite understand comment about streaming disk access...

If you tell a disk "give me files a, b, c, d, e, f (of the whole abc)", it is 
faster then if you tell it "give me files a k p q s t", because the filesystem 
can easier optimize that call. 

That's why for example Mercurial avoids hashing filenames. 

> Well, they have to a lot less than they used to, and there is
> "git gc --auto" that can be put in crontab safely.

relying on crontab which might not be available in all systems (I only use 
GNU/Linux, but what about friends of mine who have to use Windows?)

> Explicit garbage collection was a design _decision_, not a sign of not
> clear design. We can argue if it was good or bad decision, but one
> should consider the following issues:
>
>  * Rolling back last commit to correct it, or equivalently amending
>    last commit (for example because we forgot some last minute change,
>    or forgot to signoff a commit), or backing out of changes to the
>    last commit in Mercurial relies on transactions (and locking) and
>    correct O_TRUNC, while in Git it leaves dangling objects to be
>    garbage collected later.

As far as I know the only problem woth O_TRUNC was that it sadly had bugs in 
Linux.

>  * Mercurial relies on transaction support. Git relies on atomic write
>    support and on the fact that objects are immutable; those that are
>    not needed are garbage collected later. Beside IIRC some of ways of
>    implementing transaction in databases leads to garbage collecting.

But Mercurial normally works on standard filesystems, so this isn't the case 
for normal operations. 

You culd say, though, that git implements a very simple transaction model: 
Keep all old data until it gets purged explicitely. 

>  * Explicit packing and having two repository "formats": loose and
>    packed is a bit of historical reason: at the beginning there was
>    only loose format. Pack format was IIRC invented for network
>    transport, and was used for on disk storage (the same format!) for
>    better I/O patterns[1]. Having packs as 'rewrite to pack' instead
>    of 'append to pack' allows to prefer recency order, which result in
>    faster access as objects from newer commits are earlier in delta
>    chain and reduction in size in usual case of size growing with time
>    as recency order allows to use delete deltas. Also _choosing_ base
>    object allows further reduce size, especially in presence of
>    nonlinear history.

So having multiple packs is equivalent to the automatic snapshot system in 
Mercurial which doesn't need user interaction. 

>  * From what I understand Mercurial by default uses packed format for
>    branches and tags; Git uses "loose" format for recent branches
>    (meaning one file per branch), while packing older references.
>    Using loose affects performance (and size) only for insane number of
>    references, and only for some operations like listing all references,
>    while using packed format is IMHO a bit error prone when updating.

As far as I know, Mercurial got that "using packed format" right from the 
beginning. 

>  * Git has reflogs which are pruned (expired) during garbage collecting
>    to not grow them without bounds; AFAIK Mercurial doesn't have
>    equivalent of this feature.
>
>    (Reflogs store _local_ history of branch tip, noting commits,
>    fetches, merges, rewinding branch, switching branches, etc._

As far as I know Mercurial only tracks the state of the working directory, so 
it doesn't track your whole local history. 

But others can better tell you more about that in greater detail. 

> [1] You wrote about "streaming disk access". Git relies (for reading)
> on good mmap implementation.
>
> > In git is has to check all changesets which affect the file.
>
> I don't understand you here... if I understand correctly above,
> then you are wrong about Git.

Might be that I remember incorrectly about what git does. 

Are its commits "the whole changed file" or "the diff of the changes"? 

If the latter, it needs to walk back all commits to the snapshot revision to 
get the file data. 

One story I experienced with that: 

My amd64 GNU/Linux box suffers from performance problems when it gets high 
levels of disk activity (something about the filesystem layer doesn't play 
well with amd64 - reported by others, too). 

When I pulled a the Linux kernel repository with git half a year ago, my disk 
started klicking and the whole computer slowed down to a crawl. 

When I pulled the same repository data from a Mercurial repository, the 
computer kept running smooth, the disk stayed silent and happily wrote the 
data. 

Mercurial felt smooth, while git felt damn clumsy (though not slow). 

> > 1) Hg is easy to understand
>
> Because it is simple... and less feature rich, c.f. multiple local
> branches in single repository.

That works quite well. People just don't use it very often, because the 
workflow of having multiple repositories is easier with hg. 

> > 2) You don't have to understand it to use it
>
> You don't have to understand details of Git design (pack format, index,
> stages, refs,...) to use it either.

I remember that to have been incorrect about half a year ago, when I stumbled 
over many problems in git whenever I tried to do something a bit nonstandard. 

It took me hours (and in the end asking a friend) to find out about 

"git checkout ."

just to get back my deleted files. 

The answer I got when I asked why it's done that way was "this is because of 
the inner workings of git. You should know them if you use it". 

> > And both are indications of a good design, the first of the core, the
> > second of the UI.
>
> Well, Git is built around concept of DAG of commits and branches as
> references to it. Without it you can use Git, but it is hard. But
> if you understand it, you can understand easily most advanced Git
> features.
>
> I agree that Mercurial UI is better; as usually in "Worse is Better"
> case... :-)

What do you mean with that? 

Best wishes, 
Arne

-- My stuff: http://draketo.de - stories, songs, poems, programs and stuff :)
-- Infinite Hands: http://infinite-hands.draketo.de - singing a part of the 
history of free software.
-- Ein Würfel System: http://1w6.org - einfach saubere (Rollenspiel-) Regeln.

-- PGP/GnuPG: http://draketo.de/inhalt/ich/pubkey.txt

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

  reply	other threads:[~2008-10-27  7:51 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-26  4:28 [VOTE] git versus mercurial walt
2008-10-26 14:15 ` [VOTE] git versus mercurial (for DragonflyBSD) Jakub Narebski
2008-10-26 14:30   ` Maxim Vuets
2008-10-26 15:05     ` Leo Razoumov
2008-10-26 18:55       ` Jakub Narebski
2008-10-27  0:20         ` Arne Babenhauserheide
2008-10-27  4:15           ` Leo Razoumov
2008-10-27  7:16             ` Arne Babenhauserheide
2008-10-27  7:16             ` dhruva
2008-10-27  0:47         ` Arne Babenhauserheide
2008-10-27  1:52           ` Jakub Narebski
2008-10-27  7:50             ` Arne Babenhauserheide [this message]
2008-10-27  9:41               ` Jakub Narebski
2008-10-27 10:12                 ` Leslie P. Polzer
2008-10-27 10:14                 ` Arne Babenhauserheide
2008-10-27 12:48                   ` Jakub Narebski
     [not found]                     ` <200810271512.26352.arne_bab@web.de>
2008-10-27 18:01                       ` Jakub Narebski
2008-10-27 20:48                         ` Arne Babenhauserheide
2008-10-27 21:07                           ` Miklos Vajna
2008-10-27 21:30                             ` Arne Babenhauserheide
2008-10-28  0:13                               ` Miklos Vajna
2008-10-28 17:48                               ` Andreas Ericsson
2008-10-28 19:11                                 ` Arne Babenhauserheide
2008-10-28 19:38                                   ` SZEDER Gábor
2008-11-06 16:25                                     ` Marcin Kasperski
2008-11-06 17:41                                       ` Isaac Jurado
2008-10-28 19:16                                 ` Randal L. Schwartz
2008-10-27 23:25                           ` Jakub Narebski
2008-10-27  9:29             ` Benoit Boissinot
2008-10-27 10:57               ` Jakub Narebski
2008-10-27 14:29                 ` 0000 vk
2008-10-27 14:57                   ` Jakub Narebski
     [not found]             ` <1225100597.31813.11.camel@abelardo.lan>
2008-10-27 11:42               ` David Soria Parra
2008-10-27 20:07             ` Brandon Casey
2008-10-27 20:37               ` Jakub Narebski
2008-10-28  1:28                 ` Nicolas Pitre
2008-10-26 15:57   ` Felipe Contreras
2008-10-26 19:07     ` Jakub Narebski
2008-10-26 19:54       ` Felipe Contreras
2008-10-28 12:31 ` [VOTE] git versus mercurial walt
2008-10-28 14:28   ` Johannes Schindelin
2008-10-28 14:41     ` Git/Mercurial interoperability (and what about bzr?) (was: Re: [VOTE] git versus mercurial) Peter Krefting
2008-10-28 14:59       ` Johannes Schindelin
2008-10-28 15:02         ` Git/Mercurial interoperability (and what about bzr?) Matthieu Moy
2008-10-28 15:03       ` Git/Mercurial interoperability (and what about bzr?) (was: Re: [VOTE] git versus mercurial) Nicolas Pitre
2008-10-28 15:33       ` Pieter de Bie
2008-10-28 19:12         ` Miklos Vajna
2008-10-28 21:10           ` Miklos Vajna
2008-10-28 21:31           ` Theodore Tso
2008-10-28 23:28             ` Miklos Vajna
2008-11-01  8:06             ` Git/Mercurial interoperability (and what about bzr?) Florian Weimer
2008-11-01 10:03               ` Santi Béjar
2008-11-01 10:33               ` Jakub Narebski
2008-11-01 10:44                 ` Florian Weimer
2008-11-01 11:10                   ` Florian Weimer
2008-11-01 12:26                   ` Jakub Narebski
2008-11-01 13:39                   ` Theodore Tso
2008-11-01 17:51                     ` Linus Torvalds
2008-11-02  1:13                       ` Theodore Tso
2008-11-01 10:16         ` Git/Mercurial interoperability (and what about bzr?) (was: Re: [VOTE] git versus mercurial) Peter Krefting
2008-10-29 19:11     ` [VOTE] git versus mercurial Shawn O. Pearce
2008-10-29 19:36       ` Boyd Lynn Gerber
2008-10-29 19:48         ` Johannes Schindelin
2008-10-29 19:51           ` Boyd Lynn Gerber
2008-10-29  8:15   ` Miles Bader

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810270850.09696.arne_bab@web.de \
    --to=arne_bab@web.de \
    --cc=SLONIK.AZ@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=mercurial@selenic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).