All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Census] So who uses git?
@ 2006-02-01  7:08 linux
  2006-02-01  8:51 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: linux @ 2006-02-01  7:08 UTC (permalink / raw)
  To: torvalds; +Cc: git, linux

> Yes, I think the "assume unchanged" flag goes well together with making 
> sure that the checked-out file is non-writable at the time.
> 
> Of course, any number of editors and other actions won't care: if you do 
> anything like
> 
> 	for i in *.c
> 	do
> 		sed 's/xyzzy/bas/g' < $i > $i.new
> 		mv $i.new $i
> 	done
> 
> you'll never have even noticed that the old file was marked read-only. So 
> it's obviously not in any way any guarantee, but it probably makes sense 
> as a crutch.

At the risk of complicating something already very complicated, and
possibly breaking on Microsoft file systems, that case can be detected
by reading the directory and noticing that the inode number changed.

Would it be worth validating the inode numbers (which can be retrieved
in a batch) even if you don't do a full lstat()?

Or is that too Unix-centric and prone to performance problems on other
file systems?  I'd think that, even if a file system used fake inode
numbers, they'd be pretty consistent if you didn't touch the file at all,
and being different would just cause a more expensive validation.
Which would be okay as long as it's infrequent.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 [Census] So who uses git? linux
@ 2006-02-01  8:51 ` Junio C Hamano
  2006-02-01 16:04 ` Linus Torvalds
  2006-02-01 16:10 ` Alex Riesen
  2 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  8:51 UTC (permalink / raw)
  To: linux; +Cc: git

linux@horizon.com writes:

> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.
>
> Would it be worth validating the inode numbers (which can be retrieved
> in a batch) even if you don't do a full lstat()?

I suspect that what you said about Microsoft filesystems is even
stronger. IIRC the latest Cygwin stopped giving d_ino regardless
of the filesystem type; you need to do a stat() anyway.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 [Census] So who uses git? linux
  2006-02-01  8:51 ` Junio C Hamano
@ 2006-02-01 16:04 ` Linus Torvalds
  2006-02-01 16:10 ` Alex Riesen
  2 siblings, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 16:04 UTC (permalink / raw)
  To: linux; +Cc: git



On Tue, 1 Feb 2006, linux@horizon.com wrote:
> 
> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.
> 
> Would it be worth validating the inode numbers (which can be retrieved
> in a batch) even if you don't do a full lstat()?

I don't think it's worth it. It's the unusual case anyway, and it doesn't 
even really guarantee anything either (the person _could_ just have marked 
the inode writable - not understanding what is going on, he could have 
just done a "chmod +w" behind git's back).

Together with the fact that it might not work everywhere, and that I could 
well imagine that "readdir()" is slow on cygwin too (how does it do 
"d_ino"? Maybe it has to do a stat() to emulate unix behaviour?), I'm not 
convinced it's worth it.

I think the whole "assume it's valid" is a crutch - but if we do it, we 
should make it _really_ fast, because it's also useful for automated 
procedures that _know_ which files they touch. So we should make it have 
minimal impact.

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 [Census] So who uses git? linux
  2006-02-01  8:51 ` Junio C Hamano
  2006-02-01 16:04 ` Linus Torvalds
@ 2006-02-01 16:10 ` Alex Riesen
  2006-02-01 21:27   ` linux
  2 siblings, 1 reply; 84+ messages in thread
From: Alex Riesen @ 2006-02-01 16:10 UTC (permalink / raw)
  To: linux; +Cc: torvalds, git

On 1 Feb 2006 02:08:47 -0500, linux@horizon.com <linux@horizon.com> wrote:
> > Yes, I think the "assume unchanged" flag goes well together with making
> > sure that the checked-out file is non-writable at the time.
> >
> > Of course, any number of editors and other actions won't care: if you do
> > anything like
> >
> >       for i in *.c
> >       do
> >               sed 's/xyzzy/bas/g' < $i > $i.new
> >               mv $i.new $i
> >       done
> >
> > you'll never have even noticed that the old file was marked read-only. So
> > it's obviously not in any way any guarantee, but it probably makes sense
> > as a crutch.
>
> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.

Inodes are either uselessor dangerous  in cygwin (hash of an
absolute pathname on FAT). They may not even change after rm+touch.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 16:10 ` Alex Riesen
@ 2006-02-01 21:27   ` linux
  0 siblings, 0 replies; 84+ messages in thread
From: linux @ 2006-02-01 21:27 UTC (permalink / raw)
  To: linux, raa.lkml; +Cc: git, torvalds

> Inodes are either uselessor dangerous  in cygwin (hash of an
> absolute pathname on FAT). They may not even change after rm+touch.

Yes, I just looked it up and found that out.  I was hoping they used
first block number like many Linux FSes have tried, in which case it
would have worked, but if it's a hash of the path name, it's
guaranteed not to change.

And Linus' point is excellent, too: this feature is also useful
for automated systems (like git-applypatch) that can be assumed to
never forget to warn git ahead of time.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
                                         ` (3 preceding siblings ...)
  2006-02-01 19:20                       ` Julian Phillips
@ 2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 0 replies; 84+ messages in thread
From: Chuck Lever @ 2006-02-06 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List,
	Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

Linus Torvalds wrote:
>>for comparison, one of our sandboxes is sitting on an NTFS file system,
>>accessed via SMB:
>>
>>  smbfs$ time git update-index --refresh
>>  real    11m36.502s
>>  user    0m6.830s
>>  sys     0m5.086s
> 
> 
> Ouch, ouch, ouch.
> 
> Sounds like every single stat() will go out the wire. I forget what the 
> Linux NFS client does, but I _think_ it has a metadata timeout that avoids 
> this. But it might be as bad under NFS.
> 
> Has anybody used git over NFS? If it's this bad (or even close to), I 
> guess the "mark files as up-to-date in the index" approach is a really 
> good idea..
> 
> Of course, the whole point of git is that you should keep your repository 
> close, but sometimes NFS - or similar - is enforced upon you by other 
> issues, like the fact that the powers-that-be want anonymous workstations 
> and everybody should work with a home-directory automounted over NFS..

yes, i keep my Linux kernel repository in NFS (and my stgit and git 
repositories too).

there are some things that are slow precisely because my think time is 
longer than the NFS client's attribute timeout, which means that all of 
git's lstat()s turn into GETATTRs.

using the "noatime,nodiratime,actimeo=7200" mount options can have some 
benefit.  however, i found that keeping the repository packed provides 
the greatest positive impact.  that means that most of the objects are 
in a single file, and can be validated with just one GETATTR.

one thing we might conclude from this is that making "packing" an 
efficient operation (or even an incremental one) would go a long way to 
helping performance on network file systems.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 451 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Open Source NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://troy.citi.umich.edu/u/cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 22:50                               ` Junio C Hamano
@ 2006-02-02 14:59                                 ` Andreas Ericsson
  0 siblings, 0 replies; 84+ messages in thread
From: Andreas Ericsson @ 2006-02-02 14:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Linus Torvalds, git, Joel Becker

Junio C Hamano wrote:
> 
> I do not particularly have much preference among --also,
> --with-index, or --incremental, but:
> 
>  - 'with-index' is precise but might be too technical;
>  - 'incremental' is not really incremental -- you can use it
>    only once.
> 
> Because you do not have to say "git commit --also" without paths
> (which _is_ awkward) to get the traditional behaviour, maybe it
> is a good name for that flag (it is also the shortest).
> 

Except that -a, which is the logical shorthand, is already taken. How 
about --include (or --include-index, or --index) and -i? commit being a 
fairly commonly used command, I think it's safe to assume that most 
people will read the man-page or the help output if there's something 
they don't undetstand.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 16:25                         ` Linus Torvalds
@ 2006-02-02  9:12                           ` Alex Riesen
  0 siblings, 0 replies; 84+ messages in thread
From: Alex Riesen @ 2006-02-02  9:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Ray Lehtiniemi, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > $ time git update-index --refresh
> >
> > real    0m21.500s
> > user    0m0.358s
> > sys     0m1.406s
> >
> > WinNT, NTFS, 13k files, hot cache.
>
> That's 25% less files than the Linux kernel, and I can do that operation
> in 0m0.062s (0.012s user, 0.048s system).

correction. It's 18k files, which is almost the same as 2.6.13-rc6. But these
files got *very* long names (the project poisoned by classical C++ education
and breaks windows' 255 chars limit on filename length from time to time).
Refresh index in 2.6.13 is actualy consistantly faster:

$ cd src/linux-2.6.13-rc6
$ time git update-index --refresh
real    0m1.344s
user    0m0.358s
sys     0m0.984s

> So WinNT/cygwin is about 2.5 _orders_of_maginitude_ slower here, or 340
> times slower.
>
> Now, I'm tempted to say that NT is a piece of sh*t, but the fact is, your
> CPU-times seem to indicate that most of it is IO (and the "real" cost is
> just 1.7 seconds, much of which is system time, which in turn itself is
> probably due to the IO costs too - so even that isn't comparable with
> the ).
>
> Which may mean that you simply don't have enough memory to cache the whole
> thing. Which may be NT sucking, of course ("we don't like to use more than
> 10% of memory for caches"), but it might also be a tunable (which is sucky
> in itself, of course), but finally, it might just be that you just don't
> have a ton of memory. I've got 2GB in my machines, although 1GB is plenty
> to cache the kernel.

I have 2Gb, the "System Cache" is around 1.5Gb, and this is PIV 3.2GHz.
There seem to be no tunables for any kind of system stuff
(savin' on support costs, do they?).
You'd be very hardpressed not to say that windows is a piece of sh*t.

The "benchmark: several times in a row:

$ time git update-index --refresh
real    0m1.766s
user    0m0.498s
sys     0m1.203s

$ time git update-index --refresh
real    0m1.766s
user    0m0.358s
sys     0m1.390s

$ time git update-index --refresh
real    0m1.781s
user    0m0.420s
sys     0m1.311s

$ time git update-index --refresh
real    0m1.875s
user    0m0.374s
sys     0m1.343s

$ time git update-index --refresh
real    0m1.766s
user    0m0.326s
sys     0m1.375s

It is always almost the same time. I don't think it's IO, looks more like
cache accesses. It is just that bad in this cygwin+win2k combination.
Besides, I don't trust "time <command>" on windows much: it returned
sys time 0 for git-update-index in a directory which was read before.
Yes, there was disk activity, I can hear it real good with that barrakuda.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
  2006-02-01 22:35                             ` Linus Torvalds
@ 2006-02-01 22:57                             ` Daniel Barkalow
  2 siblings, 0 replies; 84+ messages in thread
From: Daniel Barkalow @ 2006-02-01 22:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Nicolas Pitre, git

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > If somebody doesn't know about the index, he normally will never have 
> > index changes _anyway_, except for the "git add" case. In which case "git 
> > commit" does the right thing for him: it will either commit the added 
> > files, or it will say "nothing to commit".
> 
> ... the original complaint was that "git commit" without
> explicit paths does not quack like "cvs/svn commit" -- commit
> all my changes in the working tree.

Actually, the original complaint was about "git commit path ...", I 
believe. That's the case where new users are finding that the behavior is 
surprising, rather than just unfamiliar.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 22:25                             ` Nicolas Pitre
@ 2006-02-01 22:50                               ` Junio C Hamano
  2006-02-02 14:59                                 ` Andreas Ericsson
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01 22:50 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git, Joel Becker

Nicolas Pitre <nico@cam.org> writes:

> Actually, my opinion is that should be the behavior for your first item 
> above (when only filenames are specified).  If you want to _also_ 
> include the index like you describe in your first item then an 
> additional switch should be provided.

OK, agreed.  Sorry to be slow.

So, to recap:

git commit paths...			(temporary index thing)
git commit --incremental paths...	(same as current w/o --incremental)
git commit               		(same as current)
git commit -a				(same as current)	

And I agree with Joel that we should not automatically imply
"git add" with or without --incremental.

I do not particularly have much preference among --also,
--with-index, or --incremental, but:

 - 'with-index' is precise but might be too technical;
 - 'incremental' is not really incremental -- you can use it
   only once.

Because you do not have to say "git commit --also" without paths
(which _is_ awkward) to get the traditional behaviour, maybe it
is a good name for that flag (it is also the shortest).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
@ 2006-02-01 22:35                             ` Linus Torvalds
  2006-02-01 22:57                             ` Daniel Barkalow
  2 siblings, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git



On Wed, 1 Feb 2006, Junio C Hamano wrote:
> 
> ... the original complaint was that "git commit" without
> explicit paths does not quack like "cvs/svn commit" -- commit
> all my changes in the working tree.

Agreed. However, I think that one is pretty easy to explain, and 
conceptually it's not a problem to just tell people to use the "-a" flag 
if they want to get CVS/SVN semantics.

After all, "git commit" will actually make it pretty obvious in the commit 
message status, _and_ if you haven't done any "git add" you'll get the 
"nothing to commit" thing, so it's not like this is hard to explain.

The real _confusion_ I think came from the filename usage.

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
@ 2006-02-01 22:25                             ` Nicolas Pitre
  2006-02-01 22:50                               ` Junio C Hamano
  2006-02-01 22:35                             ` Linus Torvalds
  2006-02-01 22:57                             ` Daniel Barkalow
  2 siblings, 1 reply; 84+ messages in thread
From: Nicolas Pitre @ 2006-02-01 22:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> To recap:
> 
>  - "git commit fileA..." means: update index at listed paths
>    (add/remove if necessary) and then commit the tree described
>    in index (the same as the current behaviour with explicit
>    paths).

No.

>  - "git commit -a" means: update index with all local changes and
>    then commit the tree described in index (the same as the
>    current behaviour).

Sensible.

>  - "git commit" means: write out the current index and commit
>    (the same as the current behaviour).

Sensible.

>  - "git commit --only fileA..." means: create a temporary index
>    from the current HEAD commit (or empty index if there is
>    none), update it at listed paths (add/remove if necessary)
>    and commit the resulting tree.  Also update the real index at
>    the listed paths (add/remove if necessary).  In the original
>    index file, the paths listed must be either empty or match
>    exactly the HEAD commit -- otherwise we error out (Linus'
>    suggestion).

Actually, my opinion is that should be the behavior for your first item 
above (when only filenames are specified).  If you want to _also_ 
include the index like you describe in your first item then an 
additional switch should be provided.

In other words, the --only should become --with-index with the behavior 
swapped.

The fact is that when you simply specify a filename, you really expect 
_only_ that filename will be affected and the rest be left alone.  
That's the most probable expectation for any tool.  If you want 
_additional_ stuff to also be merged along with the files specified then 
it is logical to have an additional argument in that case, not the other 
way around.


Nicolas

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 20:27                       ` Junio C Hamano
  2006-02-01 21:09                         ` Linus Torvalds
@ 2006-02-01 22:00                         ` Joel Becker
  1 sibling, 0 replies; 84+ messages in thread
From: Joel Becker @ 2006-02-01 22:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Linus Torvalds, git

On Wed, Feb 01, 2006 at 12:27:17PM -0800, Junio C Hamano wrote:
>  - "git commit fileA..." means: create a temporary index from the
>    current HEAD commit (or empty index if there is none), update
>    it at listed paths (add/remove if necessary) and commit the

	Please don't do the add/remove automatically.  I know, it's
pretty convenient if I explicitly say "git commit filetoadd", but what
happens if I say "git commit libfoo/*"?  I know that I want all my
changes in libfoo/ to be commited, ignoring my changes in libbar/.  But
I forgot that I created libfoo/testfoo.c to debug my changes, and now
it's in the repository -- and I might not even notice it for weeks.
	CVS and Subversion require an explicit "add" for this very
reason.  Even then, almost everyone gets an "import" or two wrong,
pulling in a couple built files (eg, "configure") they didn't mean to
get.
	I guess you could query the user.  "I noticed that you specified
filetoadd, and you never said 'git add'.  Do you want to add it now
[Y/n]?"

Joel


-- 

"When I am working on a problem I never think about beauty. I
 only think about how to solve the problem. But when I have finished, if
 the solution is not beautiful, I know it is wrong."
         - Buckminster Fuller

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 21:34                           ` Nicolas Pitre
@ 2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
                                               ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, git

Linus Torvalds <torvalds@osdl.org> writes:

>>  - "git commit" means: update index with all local changes and
>>    then commit the tree described in index (current "-a"
>>    behaviour).
>
> No. Please no. "git commit" should continue to do what it does now. 
> Otherwise you can't do the two-stage thing in any sane way.
> Requiring "--incremental"/"--also" is very confusing.

I myself did not like it but...

> If somebody doesn't know about the index, he normally will never have 
> index changes _anyway_, except for the "git add" case. In which case "git 
> commit" does the right thing for him: it will either commit the added 
> files, or it will say "nothing to commit".

... the original complaint was that "git commit" without
explicit paths does not quack like "cvs/svn commit" -- commit
all my changes in the working tree.

And actually the one you are responding to was my cunning move
to pull this exact reaction from you: "No commit without
parameter should not imply -a".  I prefer the "minor twist"
version in the same messge myself.

To recap:

 - "git commit fileA..." means: update index at listed paths
   (add/remove if necessary) and then commit the tree described
   in index (the same as the current behaviour with explicit
   paths).

 - "git commit -a" means: update index with all local changes and
   then commit the tree described in index (the same as the
   current behaviour).

 - "git commit" means: write out the current index and commit
   (the same as the current behaviour).

 - "git commit --only fileA..." means: create a temporary index
   from the current HEAD commit (or empty index if there is
   none), update it at listed paths (add/remove if necessary)
   and commit the resulting tree.  Also update the real index at
   the listed paths (add/remove if necessary).  In the original
   index file, the paths listed must be either empty or match
   exactly the HEAD commit -- otherwise we error out (Linus'
   suggestion).

 - In all cases, revert the index to the state before the
   command is run if we end up not making the commit (e.g. index
   unmerged, empty log message, pre-commit hook refusal).  With
   this, "git diff-files fileA" would show the differences as it
   showed beforean aborted "git commit -a" or "git commit fileA"
   and removes one common gripe.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:09                         ` Linus Torvalds
@ 2006-02-01 21:34                           ` Nicolas Pitre
  2006-02-01 21:59                           ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Nicolas Pitre @ 2006-02-01 21:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 1 Feb 2006, Linus Torvalds wrote:

> 
> 
> On Wed, 1 Feb 2006, Junio C Hamano wrote:
> > 
> > How about this:
> > 
> >  - "git commit --also fileA..." means: update index at listed
> >    paths (add/remove if necessary) and then commit the tree
> >    described in index (the current behaviour with explicit paths).
> 
> I'd suggest "--incremental" instead of "--also".
> 
> >  - "git commit fileA..." means: create a temporary index from the
> >    current HEAD commit (or empty index if there is none), update
> >    it at listed paths (add/remove if necessary) and commit the
> >    resulting tree.  Also update the real index at the listed
> >    paths (add/remove if necessary).  In the original index file,
> >    the paths listed must be either empty or match exactly the
> >    HEAD commit -- otherwise we error out (Linus' suggestion).
> 
> Yes.

Agreed.

> >  - "git commit" means: update index with all local changes and
> >    then commit the tree described in index (current "-a"
> >    behaviour).
> 
> No. Please no. "git commit" should continue to do what it does now. 
> Otherwise you can't do the two-stage thing in any sane way.
> 
> Requiring "--incremental"/"--also" is very confusing.
> 
> If somebody doesn't know about the index, he normally will never have 
> index changes _anyway_, except for the "git add" case. In which case "git 
> commit" does the right thing for him: it will either commit the added 
> files, or it will say "nothing to commit".

Sensible.  As long as "commit files..." actually commits _only_ those 
files unless --index (or something) is specified to also explicitly 
include the index changes.

What is really counter-intuitive is to have index changes merged by 
default when a single file is specified as argument to commit.


Nicolas

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 20:27                       ` Junio C Hamano
@ 2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 21:34                           ` Nicolas Pitre
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:00                         ` Joel Becker
  1 sibling, 2 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 21:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git



On Wed, 1 Feb 2006, Junio C Hamano wrote:
> 
> How about this:
> 
>  - "git commit --also fileA..." means: update index at listed
>    paths (add/remove if necessary) and then commit the tree
>    described in index (the current behaviour with explicit paths).

I'd suggest "--incremental" instead of "--also".

>  - "git commit fileA..." means: create a temporary index from the
>    current HEAD commit (or empty index if there is none), update
>    it at listed paths (add/remove if necessary) and commit the
>    resulting tree.  Also update the real index at the listed
>    paths (add/remove if necessary).  In the original index file,
>    the paths listed must be either empty or match exactly the
>    HEAD commit -- otherwise we error out (Linus' suggestion).

Yes.

>  - "git commit" means: update index with all local changes and
>    then commit the tree described in index (current "-a"
>    behaviour).

No. Please no. "git commit" should continue to do what it does now. 
Otherwise you can't do the two-stage thing in any sane way.

Requiring "--incremental"/"--also" is very confusing.

If somebody doesn't know about the index, he normally will never have 
index changes _anyway_, except for the "git add" case. In which case "git 
commit" does the right thing for him: it will either commit the added 
files, or it will say "nothing to commit".

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  9:59                         ` Randal L. Schwartz
@ 2006-02-01 20:48                           ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01 20:48 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: git

merlyn@stonehenge.com (Randal L. Schwartz) writes:

>>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:
>
> Junio> *1* The reason he has unrelated changes while doing a merge is
> Junio> because he works on things himself (I am speculating about
> Junio> this),
>
> You need to speculate that Linus works on things himself? :)

Forgot a smiley ;-).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 17:18                     ` Nicolas Pitre
@ 2006-02-01 20:27                       ` Junio C Hamano
  2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 22:00                         ` Joel Becker
  0 siblings, 2 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01 20:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git

Nicolas Pitre <nico@cam.org> writes:

> On Tue, 31 Jan 2006, Junio C Hamano wrote:
>
>> People who do not like this can set in their config file some
>> flag, say, 'core.index = understood', to get the current
>> behaviour.
>
> I'd avoid hidden config options that magically change behaviors and 
> semantics like that as much as possible....

I agree; it was tongue-in-cheek sort of suggestion ;-)

> It is much more intuitive to expect that, if you specify path arguments 
> to commit, then only those paths are considered, and even if you didn't 
> do a git add on some of them.  If nothing is specified then the current 
> index (the default, including a-new-file) is considered.

Good thinking.  I was not thinking about the case where you
explicitly list an untracked file to be added.

>  - a non-merge commit without any argument would imply -a.
>
>  - a non-merge commit with path arguments implies _only_ those paths, 
>    regardless if they were previously "git add"ed or not.
>
>  - a non-merge commit with, say, --no-auto or --current-index or 
>    whatever would preserve the current behavior, with or without 
>    additional paths.
>
>  - a merge commit ...
>  - a merge commit ...
>
> This might look complicated when presented like that, but I think that 
> the default behavior of each (non-merge vs merge) commit would more 
> closely fit most people's expectations....

If I may correct what I said earlier, I now realize the
"automatic -a is dangerous" argument does not have anything to
do with merges.  If the user usually works with a dirty working
tree, is aware of the index, and takes advantage of the index as
the staging area for the next commit, your --no-auto would be
needed to help her workflow.  I in principle agree with the
first three items in the above summary, except that I think it
would make more sense to do that for all commits.

How about this:

 - "git commit --also fileA..." means: update index at listed
   paths (add/remove if necessary) and then commit the tree
   described in index (the current behaviour with explicit paths).

 - "git commit fileA..." means: create a temporary index from the
   current HEAD commit (or empty index if there is none), update
   it at listed paths (add/remove if necessary) and commit the
   resulting tree.  Also update the real index at the listed
   paths (add/remove if necessary).  In the original index file,
   the paths listed must be either empty or match exactly the
   HEAD commit -- otherwise we error out (Linus' suggestion).

 - "git commit" means: update index with all local changes and
   then commit the tree described in index (current "-a"
   behaviour).

 - In all cases, revert the index to the state before the
   command is run if we end up not making the commit (e.g. index
   unmerged, empty log message, pre-commit hook refusal).

Experienced git users would end up saying "--also" without
explicit paths to defeat the automatic -a behaviour all the
time, and while the flag --also makes perfect sense when used
with one or more paths, using it like this look awkward:

        $ edit some-file
        $ git update-index some-file
        $ git commit --also

It's just a flag name so we could make --no-auto synonym to --also.

A minor twist of the above to make it friendlier to the current
git users is to do this:

 - "git commit fileA...", "git commit -a", and "git commit" keep
   the existing semantics.

 - "git commit --only fileA..." does the new temporary index
   thing.

This has an advantage that existing use is not affected, and
another advantage is that internally it is more consistent ("git
commit" is a natural extension of "git commit fileA..." with
zero path).  But one possible downside is that you need to
explicitly say --only when you want cvs-like "commit".

Since we are discussing that the people find existing
interface to be unintuitive, being consistent with the current
usage may not count as a big advantage after all..

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
  2006-01-31 19:01               ` Keith Packard
@ 2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 0 replies; 84+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Git Mailing List

Linus Torvalds wrote:
>>
>>For example, I had a hard time explaining to a friend why a git-add'ed 
>>file is committed when saying "git commit some_other_file", but not 
>>another (modified) file. Very unintuitive.
> 
> I really think you should explain it one of two ways:
> 
>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.
> 
>  - talk about the index, and revel in it as a way to explain the staging 
>    area. This is what the old tutorial.txt did before it got simplified.
> 
> The "ignore the index" approach is the simple one to explain. It's 
> strictly less powerful, but hey, what else is new? 
> 

I think both of these are probably the wrong answer, and it's pretty 
much a matter of the git model violating the principle of least 
surprise.  Perhaps added (or removed?) files need to be handled in a 
different way than they currently are.

	-hpa

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 18:58         ` Carl Baldwin
  2006-01-31 10:27           ` Johannes Schindelin
@ 2006-02-01 19:32           ` H. Peter Anvin
  1 sibling, 0 replies; 84+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:32 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Carl Baldwin wrote:
> 
> - Anyone can install and fire it up without license/contract hassles.
> 

For something like an SCM this is a big deal, and not just for the Open 
Source world.  In a company, it means not having to worry about having 
enough licenses, and getting budget approval, etc, etc, before a new 
person can join a project.  Perhaps more importantly, it allows someone 
who normally isn't *on* the project to look at it and participate.

	-hpa

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  3:48                       ` Linus Torvalds
@ 2006-02-01 19:30                         ` H. Peter Anvin
  0 siblings, 0 replies; 84+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Ray Lehtiniemi, Alex Riesen,
	Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Git Mailing List

Linus Torvalds wrote:
> 
> It's not magic, and it's not all that recent. Linux FS ops have always 
> been pretty good, and the dentry cache was introduced in 2.0.x, I think, 
> so you'd be hard-pressed to find a Linux system that doesn't have it.
> 

2.1.14, I seem to remember -- it was definitely 2.1.1x-ish.  I mostly 
recall because autofs didn't just break horribly, it took adding several 
dcache hooks to make it work again :)

	-hpa

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 19:20                       ` Julian Phillips
@ 2006-02-01 19:29                         ` Linus Torvalds
  0 siblings, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 19:29 UTC (permalink / raw)
  To: Julian Phillips
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List



On Wed, 1 Feb 2006, Julian Phillips wrote:
> 
> As it happens, yes ... I can't say that I've noticed git being particularly
> slow, but then - I've not tried running git with a local repos ... ;)

Well, NFS seems to be ok. Which is not that surprising: NFS has gotten a 
_lot_ of attention in the caching area (I worked on it myself a couple of 
years back when the page cache transition happened during 2.3.x, but 
happily we've had very good NFS maintainership since, so I don't get 
involved any more).

Your numbers show that NFS is fine (my "benchmark" is that I refuse to see 
the kinds of commit times that "cvs commit" does - easily several minutes 
for a big project. If it goes over 2 seconds, it's painful, and over ten 
seconds is totally unacceptable).

Your numbers seem to say that at least with a good network/server, NFS on 
Linux is not a problem at all.

CIFS is likely a very different animal. I suspect the cifs people have 
spent a whole lot more effort on strange Windows interaction issues than 
on trying to make sure that cached performance is top-notch.

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2006-02-01 16:15                       ` Jason Riedy
@ 2006-02-01 19:20                       ` Julian Phillips
  2006-02-01 19:29                         ` Linus Torvalds
  2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 1 reply; 84+ messages in thread
From: Julian Phillips @ 2006-02-01 19:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Linus Torvalds wrote:

> Sounds like every single stat() will go out the wire. I forget what the
> Linux NFS client does, but I _think_ it has a metadata timeout that avoids
> this. But it might be as bad under NFS.
>
> Has anybody used git over NFS? If it's this bad (or even close to), I
> guess the "mark files as up-to-date in the index" approach is a really
> good idea..

As it happens, yes ... I can't say that I've noticed git being 
particularly slow, but then - I've not tried running git with a local 
repos ... ;)

using a recentish 2.6 kernel repos, directly on the server I get:

server: linux-2.6>time git update-index --refresh

real    0m0.067s
user    0m0.015s
sys     0m0.052s

then against the same repos over NFS, I get:

client: linux-2.6>time git update-index --refresh

real    0m1.578s
user    0m0.018s
sys     0m0.366s

and if I do it from the client again soon afterward I get:

client: linux-2.6>time git update-index --refresh

real    0m0.145s
user    0m0.012s
sys     0m0.118s

>
> Of course, the whole point of git is that you should keep your repository
> close, but sometimes NFS - or similar - is enforced upon you by other
> issues, like the fact that the powers-that-be want anonymous workstations
> and everybody should work with a home-directory automounted over NFS..
>

-- 
Julian

  ---
You know it's going to be a bad day when you want to put on the clothes
you wore home from the party and there aren't any.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
  2006-02-01 17:11                     ` Linus Torvalds
@ 2006-02-01 17:18                     ` Nicolas Pitre
  2006-02-01 20:27                       ` Junio C Hamano
  2 siblings, 1 reply; 84+ messages in thread
From: Nicolas Pitre @ 2006-02-01 17:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> This "I thought I was only checking in the two-liner I did as
> the last step but you committed the whole thing, stupid git!"
> confusion feels to be a parallel of "I thought I was only
> checking in the files I specified on the command line but you
> also committed the files I earlier git-add'ed, stupid git!"
> confusion.
> 
> Taken together with your "during a partially conflicted merge"
> example, it feels to me that the simplest safety valve would be
> to refuse "git commit paths..." if the index does not exactly
> match HEAD.  Not just mentioned paths but anywhere.
> 
> People who do not like this can set in their config file some
> flag, say, 'core.index = understood', to get the current
> behaviour.

I'd avoid hidden config options that magically change behaviors and 
semantics like that as much as possible.  _This_ would pave the way to 
even greater confusion and prevent the git user base from converging on 
a unified semantics knowledge.  Better add a command line option which 
has the vertue of being visible, and name it such that it make the 
intention explicit whether the previous index state is preserved or not,
something like --current-index or the like.

> The reason I am bringing this up is because of this command
> sequence:
> 
> 	# start from a clean tree, after 'git reset --hard'
>         $ create a-new-file
>         $ git add a-new-file
>         $ edit existing-file
>         $ edit another-file
>         $ git commit existing-file
> 
> There is no question we do not commit "another-file" and we do
> commit changes to the "existing-file" as a whole.  What should
> we do to "a-new-file", and how do we explain why we do so to
> novices?
> 
> We can argue it either way.  We could say we shouldn't because
> "commit" argument does not mention it.  We could say we should
> because the user already told that he wants to add that file to
> git.  Either makes sort-of sense from what the end user did.

It is much more intuitive to expect that, if you specify path arguments 
to commit, then only those paths are considered, and even if you didn't 
do a git add on some of them.  If nothing is specified then the current 
index (the default, including a-new-file) is considered.

> I think a file "cvs add"ed is committed if whole subdirectory
> commit (similar to our "commit -a") is done or the file is
> explicitly specified on the "cvs commit" command line, and that
> may match people's expectations.  That's an argument for not
> committing "a-new-file".

Exact.

> But to be consistent with that, this should not commit anything:
> 
>         # the same clean tree.
> 	$ create a-new-file
>         $ git add a-new-file
>         $ git commit
> 
> Which is counterintuitive to me by now (because I played too
> long with git).

IMHO this should commit a_new_file simply because you added it to the 
index and a commit without any argument should commit the whole 
(refreshed) index.

> We could make "git commit" without paths to mean the current
> "-a" behaviour, which would match CVS behaviour more closely.

Exact.

> However, it would make commit after a merge conflict resolution
> in a dirty working tree _very_ dangerous -- it may give more
> familiar feel to CVS people, but it is not an improvement for
> git people at all.  I would rather not.

For that case, (assuming that -a would be the default) maybe something 
meaning the opposite of -a could be specified on the commit argument 
list like I suggested earlier.  And maybe it should always be the 
default when committing a merge (in which case the -a would override 
that and refresh everything and not only the merged files plus those 
specified on the command line).

So to resume:

 - a non-merge commit without any argument would imply -a.

 - a non-merge commit with path arguments implies _only_ those paths, 
   regardless if they were previously "git add"ed or not.

 - a non-merge commit with, say, --no-auto or --current-index or 
   whatever would preserve the current behavior, with or without 
   additional paths.

 - a merge commit would imply that --no-auto behavior automatically.

 - a merge commit could override the --no-auto with an explicit -a.

This might look complicated when presented like that, but I think that 
the default behavior of each (non-merge vs merge) commit would more 
closely fit most people's expectations.  The merge commit create a shift 
in semantics of course, but committing a merge is already something a 
bit more involved anyway and at that point git users should have gained 
a bit more experience with the index concept and the default merge 
behavior is probably what most people will expect at that point as well.


Nicolas

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
@ 2006-02-01 17:11                     ` Linus Torvalds
  2006-02-01 17:18                     ` Nicolas Pitre
  2 siblings, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 17:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> Taken together with your "during a partially conflicted merge"
> example, it feels to me that the simplest safety valve would be
> to refuse "git commit paths..." if the index does not exactly
> match HEAD.  Not just mentioned paths but anywhere.

But at that point, the existing "git commit" semantics actually are the 
ones we'd use, and the only difference ends up being that we error out 
if the index doesn't match HEAD.

The problem with that is that it appears that some of the people who don't 
like the current "git commit <filename>" thing _do_ actually understand 
the index, but they want to commit just that one file. 

So at least from my understanding, I think Dscho was arguing for the new 
semantics of "git commit <file>" to _work_, but to only commit <file>, 
even if he does understand the index perfectly well, and might have done a 
"git add" or updated a file for some other reason..

Btw, one thing that _can_ be confusing is that you do

	git commit fileA

and then when you edit the commit message, you realize that you don't 
actually want to do this at all, so you exit out of the editor without 
changes (which aborts the commit). Now "git commit" will not actually have 
done the commit, but it _will_ have done the "git-update-index" on that 
file.

So next time, when you do

	git commit fileB

you'll currently commit _both_ fileA and fileB.

This is, in my opinion, the biggest argument for the suggested _new_ 
semantics: if you explicitly name a set of files, it should always do a

	# Verify current state
	parent=$(git-rev-parse --verify HEAD) || exit

	# Verify that the current index is ok in the named files
	a=$(git-diff-files --name-only --cached $parent "$@") || exit
	if [ "$a" ]; then
	   echo -e >&2 "Files are changed in the index:\n  $a"
	   exit 2
	fi

	# create the new tree object
	export GIT_INDEX_FILE=tmpfile
	newtree=$(git-read-tree $parent &&
	  git-update-index "$@" &&
	  git-write-tree) || exit

	# edit message
	... edit message ..

	# do commit
	newhead=$(git-commit-tree -p $parent < msg)
	git-update-ref HEAD $newhead $parent

or similar. That has the advantage that if we _do_ decide to break out of 
the commit, we will not have changed the current index (only the temporary 
one).

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 14:55                       ` Alex Riesen
@ 2006-02-01 16:25                         ` Linus Torvalds
  2006-02-02  9:12                           ` Alex Riesen
  0 siblings, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01 16:25 UTC (permalink / raw)
  To: Alex Riesen
  Cc: Martin Langhoff, Ray Lehtiniemi, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List



On Wed, 1 Feb 2006, Alex Riesen wrote:
> 
> $ time git update-index --refresh
> 
> real    0m21.500s
> user    0m0.358s
> sys     0m1.406s
> 
> WinNT, NTFS, 13k files, hot cache.

That's 25% less files than the Linux kernel, and I can do that operation 
in 0m0.062s (0.012s user, 0.048s system).

So WinNT/cygwin is about 2.5 _orders_of_maginitude_ slower here, or 340 
times slower.

Now, I'm tempted to say that NT is a piece of sh*t, but the fact is, your 
CPU-times seem to indicate that most of it is IO (and the "real" cost is 
just 1.7 seconds, much of which is system time, which in turn itself is 
probably due to the IO costs too - so even that isn't comparable with 
the ).

Which may mean that you simply don't have enough memory to cache the whole 
thing. Which may be NT sucking, of course ("we don't like to use more than 
10% of memory for caches"), but it might also be a tunable (which is sucky 
in itself, of course), but finally, it might just be that you just don't 
have a ton of memory. I've got 2GB in my machines, although 1GB is plenty 
to cache the kernel.

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
  2006-02-01  2:31                       ` Junio C Hamano
@ 2006-02-01 16:15                       ` Jason Riedy
  2006-02-01 19:20                       ` Julian Phillips
  2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 0 replies; 84+ messages in thread
From: Jason Riedy @ 2006-02-01 16:15 UTC (permalink / raw)
  Cc: Git Mailing List

And Linus Torvalds writes:
 - 
 - Has anybody used git over NFS? If it's this bad (or even close to), I 
 - guess the "mark files as up-to-date in the index" approach is a really 
 - good idea..

My normal use is on NFS (Solaris and Linux) and IBM's GPFS 
(AIX and Linux).  I haven't noticed any particular problems, 
and LAPACK and the reference BLAS make a moderately sized 
working set of around 3000 source files.  Not kernel sized, 
but not tiny.

However, I mostly use git over NFS on a relatively slow 
machine.  NFS is faster than the local disk...

Jason

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:52                     ` Martin Langhoff
  2006-02-01  3:48                       ` Linus Torvalds
@ 2006-02-01 14:55                       ` Alex Riesen
  2006-02-01 16:25                         ` Linus Torvalds
  1 sibling, 1 reply; 84+ messages in thread
From: Alex Riesen @ 2006-02-01 14:55 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Ray Lehtiniemi, Linus Torvalds, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Perhaps a local git/cygwin on NTFS  would be more reasonable to benchmark?

$ time git update-index --refresh

real    0m21.500s
user    0m0.358s
sys     0m1.406s

WinNT, NTFS, 13k files, hot cache.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 22:55                   ` Joel Becker
@ 2006-02-01 14:43                     ` Johannes Schindelin
  0 siblings, 0 replies; 84+ messages in thread
From: Johannes Schindelin @ 2006-02-01 14:43 UTC (permalink / raw)
  To: Joel Becker
  Cc: Linus Torvalds, Keith Packard, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

Hi,

On Tue, 31 Jan 2006, Joel Becker wrote:

> On Tue, Jan 31, 2006 at 11:21:52AM -0800, Linus Torvalds wrote:
> > Now, I do agree. I don't actually like hiding the index too much. 
> > Understanding the index is _invaluable_ whenever you're doing a merge with 
> > conflicts, and understanding what tools are available to you to resolve 
> > those conflicts.
> 
> 	This is precisely the experience I've had explaining GIT to
> folks moving to it.  The simplest workflow (clone; hack one file, commit
> one file) is so similar to CVS/Subversion/Anything that it's immediately
> understood.  But when pull, push, merge, and any non-linear history are
> discussed, I have to describe the index and the commit/tree layout.
> Once I do, they get it.
> 
> > So I'm actually of the "revel in the index" camp (as could probably be 
> > guessed by the original tutorial).
> 
> 	I'm going to second this, from a real-world "explain it to
> others" standpoint.

How about talking about the index a bit at the end of tutorial.txt like 
this:

-- snip --
For a number of (mostly technical) reasons, "git diff" does not show the 
changes of the current working directory with respect to the latest 
commit, but rather to an intermediate stage: the "index".

Think of the index as a staging area just before committing: the commit 
object (and the tree and blob objects referenced from it) are assembled 
there.

Also, when you checkout, the index is used to disassemble the commit 
object just before writing the corresponding files and directories.
-- snap --

May this be worth the work?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  8:26                       ` Junio C Hamano
@ 2006-02-01  9:59                         ` Randal L. Schwartz
  2006-02-01 20:48                           ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Randal L. Schwartz @ 2006-02-01  9:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carl Worth, Linus Torvalds, git

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> *1* The reason he has unrelated changes while doing a merge is
Junio> because he works on things himself (I am speculating about
Junio> this),

You need to speculate that Linus works on things himself? :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:22                     ` Carl Worth
@ 2006-02-01  8:26                       ` Junio C Hamano
  2006-02-01  9:59                         ` Randal L. Schwartz
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  8:26 UTC (permalink / raw)
  To: Carl Worth; +Cc: Linus Torvalds, git

Carl Worth <cworth@cworth.org> writes:

> ... it seems it should be possible to have a class of
> "novice ready" tools that provide for common use cases and that never
> require any mention of the index in their documentation. If so, that
> seems to me a useful goal to work toward and a useful guide in this
> discussion.

I agree it is a worthy goal.  Unfortunately I lost my git
virginity long time ago, so a fresh perspective is really
appreciated in this discussion.

> ... Could you explain what the danger is here?

As Linus mentioned in an earlier message in this thread, one of
the important task for him is to take other peoples' trees and
merge it into his mainline.  The workflow goes like this:

	$ git pull from-somewhere
        ... oops there are conflicts
        $ edit conflicted/file
        $ edit more/conflicted/file
        ... maybe compile test ...
	$ git diff -c ;# final sanity check
        $ git update-index conflicted/file
        $ git update-index more/conflicted/file
        $ git commit

He does *not* want to do "git commit -a" here, because he
usually has unrelated changes in his working tree he has not
done update-index on and does _not_ want to commit [*1*].  "git
commit" to imply "git commit -a" increases the risk of
accidentally committing those unrelated changes mixed in the
merge (eh, actually makes the risk 100%).

We _could_ detect that we were in the middle of a merge,
enumerate the paths touched by the merged branches.  Then we can
say paths that are different between the index and the working
tree and not in the paths touched by the merge are his unrelated
changes.  But it is conceivable he may need to modify a file
neither branch touches in order to _logically_ resolve the
merge, even when the merge phisically does not conflict in
textual diff basis, so while that heuristics may work pretty
well most of the time, doing so might make things even less
easier to explain to other people.


[Footnotes]

*1* The reason he has unrelated changes while doing a merge is
because he works on things himself (I am speculating about
this), and for these modified paths he never runs git-add nor
git-update-index until he is ready to commit his changes (I am
not speculating about this).  As long as he knows what he is
pulling in from outside does not overlap with what he has been
working on, he can merge and commit the result without worrying
about his own unrelated changes, and git is careful not to touch
anything in his working tree to cause information loss when the
changes do overlap [*2*].

He is committing something that he never tested himself in his
working tree as a whole.  The tree resulting from the merge
never existed outside his index file, so there is no way he
could have even compile tested it properly.  But for somebody
who is playing an integrator's role, it is not his primary job
to examine and test every change he merges in as a whole at
nitty-gritty level -- that is what the originator of the change
should have done.  So having uncommitted changes in the working
tree for an integrator person is not a sign of bad discipline at
all, and supporting this workflow _is_ important for git.

The primary reason I first got involved in git was because I
wanted to help the workflow of the kernel people, especially
Linus and the subsystem maintainers.  To be honest, I personally
still consider the kernel people the first tier customers for
me, and I stop and try to think twice when thinking about a
change or a new feature that may help individual developers and
newcomers, to make sure such a change does not make life less
convenient for the 'integrator' people.  Helping integrators to
be more efficient is important because they can become
bottlenecks.

*2* I once got yelled at by Linus when I carelessly broke this
feature and changed 'git-merge' to require a clean working tree
without changes before starting a merge; it was quickly
reverted.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
@ 2006-02-01  7:22                     ` Carl Worth
  2006-02-01  8:26                       ` Junio C Hamano
  2006-02-01 17:11                     ` Linus Torvalds
  2006-02-01 17:18                     ` Nicolas Pitre
  2 siblings, 1 reply; 84+ messages in thread
From: Carl Worth @ 2006-02-01  7:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On Tue, 31 Jan 2006 22:42:05 -0800, Junio C Hamano wrote:
> 
> There is no question we do not commit "another-file" and we do
> commit changes to the "existing-file" as a whole.  What should
> we do to "a-new-file", and how do we explain why we do so to
> novices?

I'll offer a couple of ill-informed comments from a novice's
point-of-view if I may.

My first exposure to git (about 1 week ago) was "A short git
tutorial" [*]

I found the discussion of the index, git-update-index, and the subtle
distinctions between the various git-diff commands rather intimidating
for an initial introduction. After getting to know the system better
over the past week, it seems it should be possible to have a class of
"novice ready" tools that provide for common use cases and that never
require any mention of the index in their documentation. If so, that
seems to me a useful goal to work toward and a useful guide in this
discussion.

> We could make "git commit" without paths to mean the current
> "-a" behaviour, which would match CVS behaviour more closely.

Again, my novice experience leads me to favor that change. After
reading the tutorial, I had the following sequence in mind for
committing an edited file:

	git update-index edited-file
	git commit

which seemed like more pain than strictly necessary. The next day,
when I went to the linux.conf.au tutorial and saw Linus use:

	git commit -a

for the same operation it was a breath of fresh air. I was left
scratching my head wondering why the -a behavior wasn't the default
for "git commit" with no paths.

> However, it would make commit after a merge conflict resolution
> in a dirty working tree _very_ dangerous -- it may give more
> familiar feel to CVS people, but it is not an improvement for
> git people at all.  I would rather not.

I'm still not "git people" I guess. Could you explain what the danger
is here? And is it something the tool could detect and prevent?

-Carl

[*] http://www.kernel.org/pub/software/scm/git/docs/core-tutorial.html [*

A better initial introduction for me would likely have been "A
tutorial introduction to git":

http://www.kernel.org/pub/software/scm/git/docs/tutorial.html

so a link to the latter from the first paragraph or so of the former
might be very helpful.


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  3:43                         ` Linus Torvalds
@ 2006-02-01  7:03                           ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  7:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> Your point that we discussed a similar flag for the "don't require a full 
> checkout" is a good one: we should try to make sure that it works for both 
> uses. Although maybe we decided for some reason that nobody cared about 
> the non-checked-out case?

We gave them a way to add --cacheinfo but did not do any more
than that, because they are independently coming up with some
hash (not necessarily be a proper git blob object name), they
did not have the huge blob data with the working tree anyway,
and the only thing they cared about was which paths changed and
they did not even want to see how the contents changed.
I.e. "diff-tree -r" was the only thing they cared about.

If we end up doing "assume unchanged", I should remember to do a
sensible thing for "diff-index" without --cached.  It should not
look at the working tree file for paths marked as such.  This
implies one optimization in "diff-index -p" and "diff-tree -p"
may need to be disabled.  They cheat and avoid expanding blob
objects when their cache entries are clean and required blobs
are in the working tree.  If "assume unchanged" path was
actually changed, such a diff would show up as a confusing
unexpected change.

Well, the user is asking for it, so that confusion is not _my_
problem, though ;-).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
  2006-02-01  2:19                   ` Daniel Barkalow
@ 2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
                                       ` (2 more replies)
  2 siblings, 3 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  6:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> Oh, one final suggestion: if you give a filename to "git commit", and you 
> do the new semantics which means something _different_ than "do a 
> git-update-index on that file and commit", then I'd really suggest that 
> the _old_ index for that filename should match the parent exactly. 
> Otherwise, you may have done a
>
> 	git diff filename
>
> and you _thought_ you were committing just a two-line thing (because you 
> didn't understand about the index), but another, earlier, action caused 
> the index to be different from the file you had in HEAD, and in reality 
> you're actually committing a much bigger diff.

This "I thought I was only checking in the two-liner I did as
the last step but you committed the whole thing, stupid git!"
confusion feels to be a parallel of "I thought I was only
checking in the files I specified on the command line but you
also committed the files I earlier git-add'ed, stupid git!"
confusion.

Taken together with your "during a partially conflicted merge"
example, it feels to me that the simplest safety valve would be
to refuse "git commit paths..." if the index does not exactly
match HEAD.  Not just mentioned paths but anywhere.

People who do not like this can set in their config file some
flag, say, 'core.index = understood', to get the current
behaviour.

The reason I am bringing this up is because of this command
sequence:

	# start from a clean tree, after 'git reset --hard'
        $ create a-new-file
        $ git add a-new-file
        $ edit existing-file
        $ edit another-file
        $ git commit existing-file

There is no question we do not commit "another-file" and we do
commit changes to the "existing-file" as a whole.  What should
we do to "a-new-file", and how do we explain why we do so to
novices?

We can argue it either way.  We could say we shouldn't because
"commit" argument does not mention it.  We could say we should
because the user already told that he wants to add that file to
git.  Either makes sort-of sense from what the end user did.

I think a file "cvs add"ed is committed if whole subdirectory
commit (similar to our "commit -a") is done or the file is
explicitly specified on the "cvs commit" command line, and that
may match people's expectations.  That's an argument for not
committing "a-new-file".  But to be consistent with that, this
should not commit anything:

        # the same clean tree.
	$ create a-new-file
        $ git add a-new-file
        $ git commit

Which is counterintuitive to me by now (because I played too
long with git).

We could make "git commit" without paths to mean the current
"-a" behaviour, which would match CVS behaviour more closely.
However, it would make commit after a merge conflict resolution
in a dirty working tree _very_ dangerous -- it may give more
familiar feel to CVS people, but it is not an improvement for
git people at all.  I would rather not.

Right now, "git add" means "stage this for the next commit in
the index".  If we change the semantics of "git add" to mean "I
am not adding it for the next commit yet; I am just letting you
know there is a file in the working tree so that you can keep an
eye on it for me", using the intent-to-add index entry I've
mentioned a couple of times, I think the above problem might
naturally be solved.  For people who do not use update-index,
"commit -a" and "commit paths..." are the only two ways to
actually check-in anything to the index file for the next
commit ("git add" alone does not count).  "commit -a" would do
the equivalent of current "update all the not-up-to-date file to
the index and then commit", which would include the intent-to-add
paths.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
       [not found]                         ` <20060201045337.GC25753@mail.com>
  2006-02-01  5:04                           ` Linus Torvalds
@ 2006-02-01  5:42                           ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  5:42 UTC (permalink / raw)
  To: Ray Lehtiniemi; +Cc: Linus Torvalds, git

Ray Lehtiniemi <rayl@mail.com> writes:

> what if the user wants to change the mode bits of an assume-unchanged
> file with the twiddled permissions, but forgets to clear the flag
> first?  seems like that change is likely to get lost, especially if the
> new mode is read-only....

No problem, since we only record u+x bit and nothing else.  Most
importantly, we do not record any of the +w bits.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
       [not found]                         ` <20060201045337.GC25753@mail.com>
@ 2006-02-01  5:04                           ` Linus Torvalds
  2006-02-01  5:42                           ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  5:04 UTC (permalink / raw)
  To: Ray Lehtiniemi; +Cc: Junio C Hamano, git



On Tue, 31 Jan 2006, Ray Lehtiniemi wrote:
> 
> what if the user wants to change the mode bits of an assume-unchanged
> file with the twiddled permissions, but forgets to clear the flag
> first?  seems like that change is likely to get lost, especially if the
> new mode is read-only....

Remember - git only cares about execute permissions. The write permissions 
are entirely ignored by git ..

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:52                     ` Martin Langhoff
@ 2006-02-01  3:48                       ` Linus Torvalds
  2006-02-01 19:30                         ` H. Peter Anvin
  2006-02-01 14:55                       ` Alex Riesen
  1 sibling, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  3:48 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Git Mailing List



On Wed, 1 Feb 2006, Martin Langhoff wrote:
> 
> If you have such a tree, your workflow _must_ be such that you know
> exactly what files you have changed. Asking any tool to go out and
> "find which of my 20K files has changed" is doable, but it's just
> magic that it works on recent linuxes.

It's not magic, and it's not all that recent. Linux FS ops have always 
been pretty good, and the dentry cache was introduced in 2.0.x, I think, 
so you'd be hard-pressed to find a Linux system that doesn't have it.

Now, I bet Linux will be better (often by a factor of 2-3) than most other 
systems, but that still doesn't mean that 20k files is totally 
unreasonable on other setups. 

I suspect cygwin is worse than most because (a) the NT VFS layer is 
piss-poor and you need a kernel service to get good performance and (b) 
cygwin probably adds its own overhead for handling symlinks, so the 
"lstat()" call is probably even more expensive.

Now, the networked filesystems are a potential problem for everybody.

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:31                       ` Junio C Hamano
@ 2006-02-01  3:43                         ` Linus Torvalds
  2006-02-01  7:03                           ` Junio C Hamano
       [not found]                         ` <20060201045337.GC25753@mail.com>
  1 sibling, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  3:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> I think this should work fine as a mechanism, but I am a bit
> worried about the convenience and safety aspect.  It _might_
> make sense to do what RCS does; check out read-only copy by
> default and set the "assume unchanged" flag, to prevent people
> from accidentally modifying the working tree copy without
> telling the index about it.

Yes, I think the "assume unchanged" flag goes well together with making 
sure that the checked-out file is non-writable at the time.

Of course, any number of editors and other actions won't care: if you do 
anything like

	for i in *.c
	do
		sed 's/xyzzy/bas/g' < $i > $i.new
		mv $i.new $i
	done

you'll never have even noticed that the old file was marked read-only. So 
it's obviously not in any way any guarantee, but it probably makes sense 
as a crutch.

Your point that we discussed a similar flag for the "don't require a full 
checkout" is a good one: we should try to make sure that it works for both 
uses. Although maybe we decided for some reason that nobody cared about 
the non-checked-out case?

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
       [not found]                   ` <20060201013901.GA16832@mail.com>
  2006-02-01  2:04                     ` Linus Torvalds
@ 2006-02-01  2:52                     ` Martin Langhoff
  2006-02-01  3:48                       ` Linus Torvalds
  2006-02-01 14:55                       ` Alex Riesen
  1 sibling, 2 replies; 84+ messages in thread
From: Martin Langhoff @ 2006-02-01  2:52 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Linus Torvalds, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Ray Lehtiniemi <rayl@mail.com> wrote:
> by various VAR companies.  the tree in question has ~20,000 files
> totalling nearly 1.4 GB
...
>   reiserfs$ time git update-index --refresh

If you have such a tree, your workflow _must_ be such that you know
exactly what files you have changed. Asking any tool to go out and
"find which of my 20K files has changed" is doable, but it's just
magic that it works on recent linuxes.

> for comparison, one of our sandboxes is sitting on an NTFS file system,
> accessed via SMB:

you have the samba stack, network, SMB/CIFS stack and NTFS itself in
the middle. Replace the ethernet with carrier pigeons for a more
complete picture ;-)

Perhaps a local git/cygwin on NTFS  would be more reasonable to benchmark?

cheers,


martin

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
@ 2006-02-01  2:31                       ` Junio C Hamano
  2006-02-01  3:43                         ` Linus Torvalds
       [not found]                         ` <20060201045337.GC25753@mail.com>
  2006-02-01 16:15                       ` Jason Riedy
                                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  2:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> They're fast, because they are purely in the cache (well, git-update-index 
> obviously isn't, but the new op wouldn't be any _slower_ than the old 
> one).
>
> Looks simple enough. The big thing to remember is to clear that 
> "implicitly up-to-date" flag whenever we make changes (ie we'd probably 
> make "add_cache_entry()" always clear it, possibly with a flag to add it 
> as "pre-verified" which would set it).
>
> Comments? Junio, what do you think?

Somehow this reminds me of a "feature" we added quite a long
time ago to support "update-index without working tree".

I think this should work fine as a mechanism, but I am a bit
worried about the convenience and safety aspect.  It _might_
make sense to do what RCS does; check out read-only copy by
default and set the "assume unchanged" flag, to prevent people
from accidentally modifying the working tree copy without
telling the index about it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
@ 2006-02-01  2:19                   ` Daniel Barkalow
  2006-02-01  6:42                   ` Junio C Hamano
  2 siblings, 0 replies; 84+ messages in thread
From: Daniel Barkalow @ 2006-02-01  2:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Linus Torvalds wrote:

> So if you do this change (which may be the right one) then please make 
> sure that "git commit <filename>" doesn't work _at_all_ when a merge is in 
> progress (ie MERGE_HEAD exists), because it would do the wrong thing.

Agreed. I suppose it could accept doing a commit of only a few files which 
weren't touched by the merge, but I don't think even you multitask enough 
to want to do that; anyway, the user can just ditch the merge, commit 
their stuff, and try the merge again. (I bet this is a case where new 
users would be really surprised by the behavior of "git commit filename", 
except that they wouldn't think it would do anything other than give an 
error.)

> And yes, then I'll just have to force my fingers to do a simple
> 
> 	git-update-index filename
> 	git commit
> 
> instead. I can do that.
>
> Oh, one final suggestion: if you give a filename to "git commit", and you 
> do the new semantics which means something _different_ than "do a 
> git-update-index on that file and commit", then I'd really suggest that 
> the _old_ index for that filename should match the parent exactly. 
> Otherwise, you may have done a
> 
> 	git diff filename
> 
> and you _thought_ you were committing just a two-line thing (because you 
> didn't understand about the index), but another, earlier, action caused 
> the index to be different from the file you had in HEAD, and in reality 
> you're actually committing a much bigger diff.
> 
> In other words: if you want "git commit <filename>" to _not_ care about 
> the current index, then it should make sure that the index at least 
> _matches_ the current HEAD in the files mentioned.
> 
> Ie "git-diff-index --cached HEAD <filespec>" should return empty. Or 
> something like that.

Agreed here, too.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
@ 2006-02-01  2:09                       ` Linus Torvalds
  2006-02-01  2:31                       ` Junio C Hamano
                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  2:09 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List



On Tue, 31 Jan 2006, Linus Torvalds wrote:
> 
> We still have one unused bit in the cache-entry "ce_flags", so we wouldn't 
> even need to break any existing index files with it.

In case it wasn't clear, the _core_ of this optimization would be as 
simple as something like the appended.

The real meat is just making sure that CE_VALID gets set/cleared properly.

(That's also the most complex part, of course, but this trivial patch 
might help show the basic idea)

		Linus

---
diff --git a/cache.h b/cache.h
index bdbe2d6..7adc2e6 100644
--- a/cache.h
+++ b/cache.h
@@ -91,6 +91,7 @@ struct cache_entry {
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_UPDATE    (0x4000)
+#define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
 #define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
diff --git a/read-cache.c b/read-cache.c
index c5474d4..738fe78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -148,7 +148,16 @@ static int ce_match_stat_basic(struct ca
 
 int ce_match_stat(struct cache_entry *ce, struct stat *st)
 {
-	unsigned int changed = ce_match_stat_basic(ce, st);
+	unsigned int changed;
+
+	/*
+	 * If it's marked as always valid in the index, it's 
+	 * valid whatever the checked-out copy says
+	 */
+	if (ce->ce_flags & htons(CE_VALID))
+		return 0;
+
+	changed = ce_match_stat_basic(ce, st);
 
 	/*
 	 * Within 1 second of this sequence:

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
       [not found]                   ` <20060201013901.GA16832@mail.com>
@ 2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
                                         ` (4 more replies)
  2006-02-01  2:52                     ` Martin Langhoff
  1 sibling, 5 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  2:04 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List



On Tue, 31 Jan 2006, Ray Lehtiniemi wrote:
> 
> for what it's worth, it's certainly true here...  i'm using git to help
> me manage a similar project where i work.

Hmm.

We _could_ actually fairly easily add a flag to the index which means 
"don't even bother comparing - assume same", and then have specific 
operations to clear that flag.

That would allow people with slow filesystems (not just Windows: even 
under Linux, the cold-cache case is always going to be pretty slow) to 
have a _choice_: they could continue to use git it is done now (explicit 
checks), _or_ they could mark all their index caches as "implicitly 
up-to-date" and use a separate program to mark them as being potentially 
edited.

We still have one unused bit in the cache-entry "ce_flags", so we wouldn't 
even need to break any existing index files with it.

We'd just need to have two new (fast) operations:

 - mark one or more files as being "implicitly up-to-date"

   "git checkout" would do this if the proper flag was set in the 
   .git/config file.

   "git-update-index --refresh" would do this for files that weren't 
   already implicitly up-to-date _and_ the refresh actually showed it to 
   match (and the .git/config file said so).

 - mark one or more files as _not_ being implicitly up-to-date:

   people would do this by hand when editing a file (or when just deciding 
   that they want git to re-check everything again)

They're fast, because they are purely in the cache (well, git-update-index 
obviously isn't, but the new op wouldn't be any _slower_ than the old 
one).

Looks simple enough. The big thing to remember is to clear that 
"implicitly up-to-date" flag whenever we make changes (ie we'd probably 
make "add_cache_entry()" always clear it, possibly with a flag to add it 
as "pre-verified" which would set it).

Comments? Junio, what do you think?

> we're working on a vendor supplied tree which is also hacked upon
> by various VAR companies.  the tree in question has ~20,000 files
> totalling nearly 1.4 GB of source files, ms word docs, binary-only
> libraries for a wide array of processor variants, windows exe
> files, video clips, etc.  (however, the amount of actual source code
> interspersed in there is only about 6000 files totaling about 112 MB)
> 
> here's a repo sitting on the local linux filesystem with cold cache:
> 
>   reiserfs$ time git update-index --refresh
>    real    0m17.422s
>    user    0m0.025s
>    sys     0m0.320s

.. somewhat painful, but with enough memory this is hopefully a pretty 
rare case.

> and with hot cache
> 
>   reiserfs$ time git update-index --refresh
>    real    0m0.151s
>    user    0m0.020s
>    sys     0m0.067s

This is how it _should_ look.

But:

> for comparison, one of our sandboxes is sitting on an NTFS file system,
> accessed via SMB:
> 
>   smbfs$ time git update-index --refresh
>   real    11m36.502s
>   user    0m6.830s
>   sys     0m5.086s

Ouch, ouch, ouch.

Sounds like every single stat() will go out the wire. I forget what the 
Linux NFS client does, but I _think_ it has a metadata timeout that avoids 
this. But it might be as bad under NFS.

Has anybody used git over NFS? If it's this bad (or even close to), I 
guess the "mark files as up-to-date in the index" approach is a really 
good idea..

Of course, the whole point of git is that you should keep your repository 
close, but sometimes NFS - or similar - is enforced upon you by other 
issues, like the fact that the powers-that-be want anonymous workstations 
and everybody should work with a home-directory automounted over NFS..

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
@ 2006-02-01  0:52                   ` Junio C Hamano
  2006-02-01  2:19                   ` Daniel Barkalow
  2006-02-01  6:42                   ` Junio C Hamano
  2 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-02-01  0:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> One thing to be careful about is merges.
> ...
> So the current "git commit filename" behaviour is actually the only 
> possible correct one for a merge. Nothing else makes any sense 
> what-so-ever.

Agreed 100%, and I kind of feel silly about not mentioning that
myself.  It _might_ even make sense to reject explicit filenames
when MERGE_HEAD does not exist ;-).

> Oh, one final suggestion: if you give a filename to "git
> commit", and you do the new semantics which means something
> _different_ than "do a git-update-index on that file and
> commit", then I'd really suggest that the _old_ index for that
> filename should match the parent exactly.

That is also a good safety measure.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:47               ` Junio C Hamano
@ 2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
                                     ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-02-01  0:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Daniel Barkalow, Johannes Schindelin, Carl Baldwin,
	Keith Packard, Martin Langhoff, Git Mailing List



On Tue, 31 Jan 2006, Junio C Hamano wrote:
> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I sort of suspect that "git commit some_other_file" should really read 
> > HEAD into a temporary index, update "some_other_file" in that (and the 
> > main index), and commit it.
> > ...
> > The surprising thing is that "git commit path ..." means
> > "everything I've already mentioned, plus path..." not just
> > "path ...", and it's particularly surprising because people
> > only tend to specify paths when they've done something they
> > don't want to commit.
> 
> Interesting idea, and a good point.

One thing to be careful about is merges.

This actually happens to me:

	git pull ....

	.. uhhuh, trivial conflict in one file ..
	.. edit the/file/that/conflicted ..

	git commit the/file/that/conflicted

and there is no way that it would ever be correct to then just commit that 
one file. The fact that it's a merge means that the rest of the index - 
which is all from the merge, and correct - absolutely _must_ be committed 
too.

And yes, I could use "git commit -a" (and I often do), but the thing is, I 
surprisingly often have edits in unrelated files (stuff that the merge 
never touched), and doing "git commit -a" would do the wrong thing.

So the current "git commit filename" behaviour is actually the only 
possible correct one for a merge. Nothing else makes any sense 
what-so-ever.

Now, I can hear people arguing that "ok, merges are special, and for 
merges we always do it in the current index", but that makes "git commit 
pathname" act very _differently_ for a merge than for a normal commit. 
That just smells wrong to me.

So if you do this change (which may be the right one) then please make 
sure that "git commit <filename>" doesn't work _at_all_ when a merge is in 
progress (ie MERGE_HEAD exists), because it would do the wrong thing.

And yes, then I'll just have to force my fingers to do a simple

	git-update-index filename
	git commit

instead. I can do that.

Oh, one final suggestion: if you give a filename to "git commit", and you 
do the new semantics which means something _different_ than "do a 
git-update-index on that file and commit", then I'd really suggest that 
the _old_ index for that filename should match the parent exactly. 
Otherwise, you may have done a

	git diff filename

and you _thought_ you were committing just a two-line thing (because you 
didn't understand about the index), but another, earlier, action caused 
the index to be different from the file you had in HEAD, and in reality 
you're actually committing a much bigger diff.

In other words: if you want "git commit <filename>" to _not_ care about 
the current index, then it should make sure that the index at least 
_matches_ the current HEAD in the files mentioned.

Ie "git-diff-index --cached HEAD <filespec>" should return empty. Or 
something like that.

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:16             ` Daniel Barkalow
  2006-01-31 23:36               ` Petr Baudis
@ 2006-01-31 23:47               ` Junio C Hamano
  2006-02-01  0:38                 ` Linus Torvalds
  1 sibling, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2006-01-31 23:47 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Linus Torvalds, Git Mailing List

Daniel Barkalow <barkalow@iabervon.org> writes:

> I sort of suspect that "git commit some_other_file" should really read 
> HEAD into a temporary index, update "some_other_file" in that (and the 
> main index), and commit it.
> ...
> The surprising thing is that "git commit path ..." means
> "everything I've already mentioned, plus path..." not just
> "path ...", and it's particularly surprising because people
> only tend to specify paths when they've done something they
> don't want to commit.

Interesting idea, and a good point.

Not that I particularly would like to encourage people to make
partial commits by making it easier, but as long as we allow our
users to say "commit path...", your proposal would reduce the
confusion.

I wonder which is faster, to check if index differs from HEAD
and do the temporary index only when they differ, or always use
a temporary without checking?  The former needs one diff-index
--cached, zero or one read-tree, one write-tree and one
commit-tree.  The latter always needs one read-tree, one
write-tree and one commit-tree.

Wait.  We already do diff-index --cached during git-commit
anyway (it is in git-status).  Maybe with a bit of code
restructuring we can do the temporary index part optional.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:16             ` Daniel Barkalow
@ 2006-01-31 23:36               ` Petr Baudis
  2006-01-31 23:47               ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Petr Baudis @ 2006-01-31 23:36 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Linus Torvalds, Git Mailing List

Dear diary, on Wed, Feb 01, 2006 at 12:16:26AM CET, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> said that...
> On Tue, 31 Jan 2006, Johannes Schindelin wrote:
> 
> > Hi,
> > 
> > On Mon, 30 Jan 2006, Carl Baldwin wrote:
> > 
> > > In general, I think it is grasping the reason for the index file and how 
> > > git commands like git-commit and git-diff interact with it.
> > 
> > IMHO this is the one big showstopper. I had problems explaining the 
> > concept myself.
> > 
> > For example, I had a hard time explaining to a friend why a git-add'ed 
> > file is committed when saying "git commit some_other_file", but not 
> > another (modified) file. Very unintuitive.
> 
> I sort of suspect that "git commit some_other_file" should really read 
> HEAD into a temporary index, update "some_other_file" in that (and the 
> main index), and commit it.

FWIW, this is also what cg-commit does.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
  2006-01-31 17:30             ` Linus Torvalds
@ 2006-01-31 23:16             ` Daniel Barkalow
  2006-01-31 23:36               ` Petr Baudis
  2006-01-31 23:47               ` Junio C Hamano
  2 siblings, 2 replies; 84+ messages in thread
From: Daniel Barkalow @ 2006-01-31 23:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Junio C Hamano, Keith Packard, Martin Langhoff,
	Linus Torvalds, Git Mailing List

On Tue, 31 Jan 2006, Johannes Schindelin wrote:

> Hi,
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.

I sort of suspect that "git commit some_other_file" should really read 
HEAD into a temporary index, update "some_other_file" in that (and the 
main index), and commit it. The concept of the index isn't hard (it's the 
preparation you've made so far towards a commit), and plain "git commit" 
makes sense with it; "git commit -a" also makes sense, since committing 
all changes is pretty clear. The surprising thing is that "git commit path 
..." means "everything I've already mentioned, plus path..." not just 
"path ...", and it's particularly surprising because people only tend to 
specify paths when they've done something they don't want to commit.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:21                 ` Linus Torvalds
@ 2006-01-31 22:55                   ` Joel Becker
  2006-02-01 14:43                     ` Johannes Schindelin
  0 siblings, 1 reply; 84+ messages in thread
From: Joel Becker @ 2006-01-31 22:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Keith Packard, Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 11:21:52AM -0800, Linus Torvalds wrote:
> Now, I do agree. I don't actually like hiding the index too much. 
> Understanding the index is _invaluable_ whenever you're doing a merge with 
> conflicts, and understanding what tools are available to you to resolve 
> those conflicts.

	This is precisely the experience I've had explaining GIT to
folks moving to it.  The simplest workflow (clone; hack one file, commit
one file) is so similar to CVS/Subversion/Anything that it's immediately
understood.  But when pull, push, merge, and any non-linear history are
discussed, I have to describe the index and the commit/tree layout.
Once I do, they get it.

> So I'm actually of the "revel in the index" camp (as could probably be 
> guessed by the original tutorial).

	I'm going to second this, from a real-world "explain it to
others" standpoint.

Joel

-- 

"Every day I get up and look through the Forbes list of the richest
 people in America. If I'm not there, I go to work."
        - Robert Orben

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 20:56                 ` Sam Ravnborg
@ 2006-01-31 22:21                   ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-01-31 22:21 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: git

"Sam Ravnborg" <sam@ravnborg.org> writes:

> But the primary thing is cg-commit
> I give you a list of files modified which can be edited and
> it have saved me a couple of times commiting to much.
> And I get vi fired up so no need to fiddle with command line argumetns.

[this is what I sent in a separate message but I goofed up the
destination headers and the message did not appear on the list,
so I am reprinting.]

I have always felt "git commit paths..." was a mistake; it
encourages partial commits by individual developers.

By "partial commit", I mean a commit that does not exactly match
the state of the working tree when the commit is made.  There
are two kinds of "partial commits".  Good ones and bad ones.

Being able to make partial commits is handy for people whose
primary role is to integrate many changes from trusted
developers rather than testing each and every commit as a whole
(read: Linus and subsystem maintainers).  Integrators' job may
include testing what have been merged as a whole by a compile
and reboot cycle as the final "wrap-up" step, but the most
important role they play is to sanity check the changes from
architectural perspective.

For that workflow to work effectively, however, the changes fed
by individual developers to the integrators have to be clean and
well tested.  A partial commit records something that never
existed in any working tree as a whole, so by definition it is
an untested change.  You would risk "sorry I forgot to commit
the changes to these paths but without them it does not even
compile", and end up wasting integrators' time.

The integrators make commits out of their working trees using
git-merge and git-apply to record changes made by others after
reviewing them.  These commands ignore unconflicting local
changes (but notices conflicting ones to operate correctly), and
allow them to make partial commits.  This is a good thing;
otherwise they would have to reset their own changes in their
working tree, only to do merges and to accept patches.  However,
people playing the integrator role rarely have reason to use
"git commit paths..." while merging from others to make such a
partial commit.  Only after they resolve conflicts by hand,
perhaps.  But that happens far less often than careless
individual developers making partial commits of bad kind using
the same "git commit paths..." command.

This is the reason why I feel "git commit paths..." is a bad
feature.  It helps to make bad partial commits, without having
to do much with making good partial commits.

Many SCMs may have the ability to do "commit paths...", but that
does not change the fact that it encourages carelessness for
individual developers, which is especially bad in a distributed
development workflow like the Linux kernel style [*1*].

But that was not my change ;-).


[Foornote]

*1* It could be argued that being able to do partial commit is a
good thing in other SCM systems where there is no equivalent to
our "index" file.  It is one way for the developer to snapshot
their work-in-progress state where they might later come back to
if the approach they are currently pursuing does not pan out.
But for that, we have index file we can "check into" without
committing.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 21:25               ` Linus Torvalds
  2006-01-31 21:52                 ` J. Bruce Fields
@ 2006-01-31 22:01                 ` Alex Riesen
       [not found]                   ` <20060201013901.GA16832@mail.com>
  1 sibling, 1 reply; 84+ messages in thread
From: Alex Riesen @ 2006-01-31 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Martin Langhoff, Git Mailing List

Linus Torvalds, Tue, Jan 31, 2006 22:25:08 +0100:
> > I use git in cygwin for a project with more then 17k files (almost
> > 6M lines).  It's real slow on ntfs (on 3.2Mhz PIV!)
> ...
> So we could speed it up on cygwin (and yes, it would speed git up a lot 
> even on Linux, but since the cached lstat() case is so fast anyway, I 
> doubt a lot of Linux users care - the biggest win would be on a cold-cache 
> tree).  But it would require that you explicitly _mark_ the files you edit 
> some way.

I'd hate to have to do that. The project in question is just stuffed
up beyond all reason, windows' VFS is a sorry piece of junk, and I
care much more about how comfortable the tool is.

> ...
> For small projects (or big projects with fairly few files), it really 
> shouldn't matter. Your 17k files example is hopefully fairly rare..

I'd say it is fairly common. It's what driven by paranoia and
suffering from chronic undereducation projects in big companies
usually end up with. Frequently right from the start...

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 21:25               ` Linus Torvalds
@ 2006-01-31 21:52                 ` J. Bruce Fields
  2006-01-31 22:01                 ` Alex Riesen
  1 sibling, 0 replies; 84+ messages in thread
From: J. Bruce Fields @ 2006-01-31 21:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 01:25:08PM -0800, Linus Torvalds wrote:
> So we could speed it up on cygwin (and yes, it would speed git up a lot 
> even on Linux, but since the cached lstat() case is so fast anyway, I 
> doubt a lot of Linux users care - the biggest win would be on a cold-cache 
> tree).  But it would require that you explicitly _mark_ the files you edit 
> some way.

You couldn't depend on a combination of lstat's and some kind of
filesystem change notifications?

--b.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 22:51             ` Alex Riesen
@ 2006-01-31 21:25               ` Linus Torvalds
  2006-01-31 21:52                 ` J. Bruce Fields
  2006-01-31 22:01                 ` Alex Riesen
  0 siblings, 2 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-01-31 21:25 UTC (permalink / raw)
  To: Alex Riesen
  Cc: Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Martin Langhoff, Git Mailing List



On Mon, 30 Jan 2006, Alex Riesen wrote:
> 
> I use git in cygwin for a project with more then 17k files (almost 6M lines).
> It's real slow on ntfs (on 3.2Mhz PIV!)

One thing that git does rely on is a fast "lstat()" system call. The index 
file means that we almost never need to read the contents of a file to 
compare, but git _does_ check that files haven't been modified, and doing 
an "lstat()" on every single file it knows about is the way to do that.

Now, I suspect that you simply can't do basic filename lookups much faster 
than Linux does them. The Linux VFS layer name caching reigns supreme: the 
dentries are just incredibly powerful, and the reason Linux kicks ass on 
many benchmarks.

And yes, git was designed for it. git is _really_ fast on Linux, but any 
operating system that is so stupid that it has to call down to the 
low-level filesystem for filename lookup (which is most of them, and from 
what I have heard, the NT VFS layer is worse than most) will take a lot 
longer.

This is sadly not something I think you can possibly avoid. Git is 
literally being as fast as is humanly possible without doing explicit 
locking. You _can_ avoid the "lstat()" calls if you are willing to always 
explicitly mark files that you have changed (so that the SCM can stat just 
_those_ files and ignore all the others), but I personally much prefer 
being able to use any random tools on the files without having to prepare 
them some way.

So we could speed it up on cygwin (and yes, it would speed git up a lot 
even on Linux, but since the cached lstat() case is so fast anyway, I 
doubt a lot of Linux users care - the biggest win would be on a cold-cache 
tree).  But it would require that you explicitly _mark_ the files you edit 
some way.

Btw, BK wanted that, and it wasn't _too_ painful. You had to do

	bk edit

to mark a file as being ready to be dirtied, and as a helper command you 
would use

	bk editor

which would first do the "bk edit" thing and then start up your favourite 
editor (the usual ${EDITOR:${VISUAL:vi}} rules applied) on it, and it 
worked fine. We _could_ do the same in git.

I'd just prefer not to.

For small projects (or big projects with fairly few files), it really 
shouldn't matter. Your 17k files example is hopefully fairly rare..

> But its more intuitive and more powerful than any alternatives here (Perforce,
> SVN and CVS come to mind).

Good to know.

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 20:43                   ` Junio C Hamano
@ 2006-01-31 21:02                     ` Radoslaw Szkodzinski
  0 siblings, 0 replies; 84+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 21:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Greg KH, Keith Packard, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

Junio C Hamano wrote:
> Radoslaw Szkodzinski <astralstorm@gorzow.mm.pl> writes:
> 
>> Radoslaw Szkodzinski wrote:
>>> Cloning without -l option is much slower - some minutes vs below a minute.
>>> I could have time(8)d it, but it's no use.
>>>
>> Make that time(1)d.
>>
>> Results for the kernel follow. Disc cache has been preheated with find.
> 
> While you are at it, "git clone -l -s -n" might be more interesting.
> 
> 

Sure:

time git clone -l -s -n linux-2.6.git linux-2.6.git.lsn

real    0m0.458s
user    0m0.020s
sys     0m0.027s

Speed demon. I'd use it, but I often need a checkout anyway, so...

time git clone -l -s linux-2.6.git linux-2.6.git.ls

real    0m35.752s
user    0m2.661s
sys     0m2.374s

Not really better than git clone -l and relies on the tools more.
However, it should make for easier repacking and pruning. I'll keep it.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:01               ` Keith Packard
  2006-01-31 19:21                 ` Linus Torvalds
@ 2006-01-31 20:56                 ` Sam Ravnborg
  2006-01-31 22:21                   ` Junio C Hamano
  1 sibling, 1 reply; 84+ messages in thread
From: Sam Ravnborg @ 2006-01-31 20:56 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, keithp, Johannes Schindelin, Carl Baldwin,
	Junio C Hamano, Martin Langhoff, Git Mailing List

> As a newly initiated user, this would have been a more gentle
> introduction to the system. But, it would be hard to make it entirely
> invisible given the current interfaces. I'm not sure if obscuring the
> presense of the index is a great plan; it's already hard enough to
> figure out how it works.

I have found myself using a mixture of cogito and git commands lately.
Part of it being that my finger type something like:
rm `git ls-files -m`
cg-restore

and I have not convinced them about git reset --hard


But the primary thing is cg-commit
I give you a list of files modified which can be edited and
it have saved me a couple of times commiting to much.
And I get vi fired up so no need to fiddle with command line argumetns.

   Sam

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
@ 2006-01-31 20:56                       ` J. Bruce Fields
  0 siblings, 0 replies; 84+ messages in thread
From: J. Bruce Fields @ 2006-01-31 20:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Loeliger, git

On Tue, Jan 31, 2006 at 12:41:59PM -0800, Junio C Hamano wrote:
> On the tutorial front, maybe we could start teaching people to
> always use "commit -a", and not tell them about update-index nor
> "commit paths.." at all.  Have them do "hello world", review
> changes since the last commit with "git diff", and make commit
> with "git commit -a".  Next tell them about index, and after
> they understand index, finally tell them "commit paths..."  is
> there merely to reduce typing.

Yeah, I think that's approximately what you get right now if you read
tutorial.txt followed by core-tutorial.txt, though the two currently may
not really work together well as sequels.

So I'm inclined to start by revising the two to make them read well as
sequels, then maybe moving some of core-tutorial.txt into the earlier
tutorial.txt.  By the time we're done the two might end up being one
document.  Or they might still be two, but with the split being more
clearly beginning/advanced instead of high-level/low-level.

Feedback from people who'd actually worked through the two would
obviously be useful.

--b.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:50                 ` Radoslaw Szkodzinski
@ 2006-01-31 20:43                   ` Junio C Hamano
  2006-01-31 21:02                     ` Radoslaw Szkodzinski
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2006-01-31 20:43 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Greg KH, Keith Packard, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Radoslaw Szkodzinski <astralstorm@gorzow.mm.pl> writes:

> Radoslaw Szkodzinski wrote:
>> Cloning without -l option is much slower - some minutes vs below a minute.
>> I could have time(8)d it, but it's no use.
>> 
>
> Make that time(1)d.
>
> Results for the kernel follow. Disc cache has been preheated with find.

While you are at it, "git clone -l -s -n" might be more interesting.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:44                   ` Jon Loeliger
@ 2006-01-31 20:06                   ` J. Bruce Fields
  1 sibling, 0 replies; 84+ messages in thread
From: J. Bruce Fields @ 2006-01-31 20:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 11:33:21AM -0800, Junio C Hamano wrote:
> I think many good stuff git offers would not be helpful to the
> users until index is understood as the third entity, in addition
> to the usual "committed state" and "working tree state".  It
> might be better to talk about it sooner rather than later.  And
> the tool is geared towards taking advantage of it, so until the
> user understands that, behaviour of some tools would feel
> unintuitive.

Yeah, makes sense.  But I'd like to introduce that while still
introducing the higher-level tools earlier on than core-tutorial.txt
does.  I'll give some thought to how to move things in that direction,
maybe this weekend....

--b.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:44                   ` Jon Loeliger
@ 2006-01-31 19:52                     ` Junio C Hamano
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
  1 sibling, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-01-31 19:52 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: git

Jon Loeliger <jdl@freescale.com> writes:

> I have done this style of "update-index on more-or-less OK
> files in order to clear up the diff.  And it is also in that
> time frame that I start feeling that certain changes belong
> to "one commit" or another.  The result is, I want to then
> pick the parts that get committed together.  But _really_
> being certain exactly which files, and _only_ those files,
> will really be committed is tough.

	$ git diff --cached

would help.  If you are _only_ comitting either all changes or no
change per path, 'git diff --cached --name-status' would be
sufficient.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 18:33               ` Radoslaw Szkodzinski
@ 2006-01-31 19:50                 ` Radoslaw Szkodzinski
  2006-01-31 20:43                   ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 19:50 UTC (permalink / raw)
  To: Greg KH
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

Radoslaw Szkodzinski wrote:
> Cloning without -l option is much slower - some minutes vs below a minute.
> I could have time(8)d it, but it's no use.
> 

Make that time(1)d.

Results for the kernel follow. Disc cache has been preheated with find.

git version: 5b2bcc7b2d546c636f79490655b3347acc91d17f
Filesystem: ext3 data=writeback
Kernel: 2.6.16-rc1-astorm2 (mostly -ck patchset with "hotfix")
Elevator: CFQ

time git clone linux-2.6.git linux-2.6.git.new
Packing 180025 objects

real    8m31.637s
user    3m19.571s
sys     0m42.211s

Extremely bad. The task is mostly cpu-bound.
Made some background applications swap out late in the process.
(that's the cause of the sys time)

time git clone -l linux-2.6.git linux-2.6.git.local
0 blocks

real    0m42.339s
user    0m2.818s
sys     0m4.040s

Good enough for me. Possibly cp -rl of objects and then a checkout.

time cp -rl linux-2.6.git linux-2.6.git.rl

real    0m18.333s
user    0m0.103s
sys     0m1.732s

Really fast, but requires additional file modification.
(namely .git/remotes/origin, removal of gitrc)
Also incompatible with apps having problems with hardlinks.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:33                 ` Junio C Hamano
@ 2006-01-31 19:44                   ` Jon Loeliger
  2006-01-31 19:52                     ` Junio C Hamano
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
  2006-01-31 20:06                   ` J. Bruce Fields
  1 sibling, 2 replies; 84+ messages in thread
From: Jon Loeliger @ 2006-01-31 19:44 UTC (permalink / raw)
  To: Git List

On Tue, 2006-01-31 at 13:33, Junio C Hamano wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:

> I think many good stuff git offers would not be helpful to the
> users until index is understood as the third entity, in addition
> to the usual "committed state" and "working tree state".  It
> might be better to talk about it sooner rather than later.  And
> the tool is geared towards taking advantage of it, so until the
> user understands that, behaviour of some tools would feel
> unintuitive.

Agreed.

> You can have local throw-away modifications while applying
> patches and merging (I once broke merges by ignoring that it is
> perfectly valid to have index and working tree files be
> different and keep working that way.  That was a hard lesson).
> The index file knows what working tree changes are meant to be
> committed.  Another thing I find useful, which cannot be done
> without index, is to sanity check while developing.  When "git
> diff" gives too many diffs, running update-index on paths that I
> think are more-or-less OK helps to reduce clutter, and I can
> view only further changes to those paths.

And right there is where people get caught by surprise.
What "they" then want to do is actually pick certain
files to commit.  And when they do, they get caught off
guard by the _additional_ files.

I have done this style of "update-index on more-or-less OK
files in order to clear up the diff.  And it is also in that
time frame that I start feeling that certain changes belong
to "one commit" or another.  The result is, I want to then
pick the parts that get committed together.  But _really_
being certain exactly which files, and _only_ those files,
will really be committed is tough.

jdl

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 18:12               ` J. Bruce Fields
@ 2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:44                   ` Jon Loeliger
  2006-01-31 20:06                   ` J. Bruce Fields
  0 siblings, 2 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-01-31 19:33 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Linus Torvalds, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

"J. Bruce Fields" <bfields@fieldses.org> writes:

> On Tue, Jan 31, 2006 at 09:30:48AM -0800, Linus Torvalds wrote:
>>
>> The "ignore the index" approach is the simple one to explain. It's 
>> strictly less powerful, but hey, what else is new? 
>
> Yeah, I do wonder what's likely to be the best approach for most users.
> My goal with the new tutorial was to get a reader doing something fun
> and useful as quickly as possible.  So it just refers elsewhere for any
> discussion of the index file or SHA1 names.  But probably everyone needs
> to pick up that stuff eventually anyway, and maybe it's better to get to
> it a little sooner, I dunno.

I think many good stuff git offers would not be helpful to the
users until index is understood as the third entity, in addition
to the usual "committed state" and "working tree state".  It
might be better to talk about it sooner rather than later.  And
the tool is geared towards taking advantage of it, so until the
user understands that, behaviour of some tools would feel
unintuitive.

You can have local throw-away modifications while applying
patches and merging (I once broke merges by ignoring that it is
perfectly valid to have index and working tree files be
different and keep working that way.  That was a hard lesson).
The index file knows what working tree changes are meant to be
committed.  Another thing I find useful, which cannot be done
without index, is to sanity check while developing.  When "git
diff" gives too many diffs, running update-index on paths that I
think are more-or-less OK helps to reduce clutter, and I can
view only further changes to those paths.

In a sense, update-index can be thought of to check in the
changes without committing.  You can check in number of times,
and the cumulative effect is committed later.  "reset --mixed"
is undoing these uncommitted check-ins.  "reset --hard" undoes
the last commit.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:01               ` Keith Packard
@ 2006-01-31 19:21                 ` Linus Torvalds
  2006-01-31 22:55                   ` Joel Becker
  2006-01-31 20:56                 ` Sam Ravnborg
  1 sibling, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2006-01-31 19:21 UTC (permalink / raw)
  To: Keith Packard
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List



On Tue, 31 Jan 2006, Keith Packard wrote:

> On Tue, 2006-01-31 at 09:30 -0800, Linus Torvalds wrote:
> 
> >  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
> >    people about use individual filenames to git-commit. Maybe even add 
> >    "-a" by default to the git-commit flags as a special installation 
> >    addition.
> 
> As a newly initiated user, this would have been a more gentle
> introduction to the system. But, it would be hard to make it entirely
> invisible given the current interfaces. I'm not sure if obscuring the
> presense of the index is a great plan; it's already hard enough to
> figure out how it works.

Now, I do agree. I don't actually like hiding the index too much. 
Understanding the index is _invaluable_ whenever you're doing a merge with 
conflicts, and understanding what tools are available to you to resolve 
those conflicts.

The index is also obviously very important when you do a partial commit, 
and it's something I do end up doing quite often. Again, maybe that's not 
something that a new git user should be encouraged to ever do, but it's a 
huge convenience feature for power-users.

Understanding the index also allows people to understand certain 
performance-characteristics of git, and explains how "git add" (and 
remove, if we had one) actually works independently of the commit. 

So I'm actually of the "revel in the index" camp (as could probably be 
guessed by the original tutorial).

My personal suggestion would be to introduce git "gently" by ignoring it, 
but by the time a person actually _works_ on a project (as opposed to just 
going through a tutorial or following another persons project), he/she 
should probably have been introduced to the index in order to understand 
what happens and to use its power.

(In particular, the difference between "git diff" and "git diff HEAD" is 
an important one to understand eventually).

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
@ 2006-01-31 19:01               ` Keith Packard
  2006-01-31 19:21                 ` Linus Torvalds
  2006-01-31 20:56                 ` Sam Ravnborg
  2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 2 replies; 84+ messages in thread
From: Keith Packard @ 2006-01-31 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keithp, Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Tue, 2006-01-31 at 09:30 -0800, Linus Torvalds wrote:

>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.

As a newly initiated user, this would have been a more gentle
introduction to the system. But, it would be hard to make it entirely
invisible given the current interfaces. I'm not sure if obscuring the
presense of the index is a great plan; it's already hard enough to
figure out how it works.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 18:12             ` Greg KH
@ 2006-01-31 18:33               ` Radoslaw Szkodzinski
  2006-01-31 19:50                 ` Radoslaw Szkodzinski
  0 siblings, 1 reply; 84+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 18:33 UTC (permalink / raw)
  To: Greg KH
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

Greg KH wrote:
> On Sun, Jan 29, 2006 at 12:18:45PM +0100, Radoslaw Szkodzinski wrote:
>> The only drawback is local cloning. This operation is like 4x slower
>> than plain copying of the repository. Probably because it works like an
>> ssh clone - creates a pack, copies it, then unpacks. This is just
>> inefficient on a local machine.
> 
> Have you tried the "-l" option for cloneing locally?  It's _very_ fast,
> even for my tiny little old laptop.

Because it's cp -rl <one-tree> <second-tree> and some file modifications, right?
It's what I've been using already.

This -l option should be more prominent in the documentation.
Maybe it even already is. I've taught myself using git before 0.9.

Thank you. This helps a lot.

> If you add a "-n" that will not checkout the source tree, so you can
> compare the time of cloning with the checkout portion.

Cloning without -l option is much slower - some minutes vs below a minute.
I could have time(8)d it, but it's no use.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
@ 2006-01-31 18:12               ` J. Bruce Fields
  2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:01               ` Keith Packard
  2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 1 reply; 84+ messages in thread
From: J. Bruce Fields @ 2006-01-31 18:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 09:30:48AM -0800, Linus Torvalds wrote:
> I really think you should explain it one of two ways:
> 
>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.
> 
>  - talk about the index, and revel in it as a way to explain the staging 
>    area. This is what the old tutorial.txt did before it got simplified.
> 
> The "ignore the index" approach is the simple one to explain. It's 
> strictly less powerful, but hey, what else is new? 

Yeah, I do wonder what's likely to be the best approach for most users.
My goal with the new tutorial was to get a reader doing something fun
and useful as quickly as possible.  So it just refers elsewhere for any
discussion of the index file or SHA1 names.  But probably everyone needs
to pick up that stuff eventually anyway, and maybe it's better to get to
it a little sooner, I dunno.

Besides the git-add/git-commit thing, the other thing that caught me by
suprise was the behaviour of git reset.  I expected there to be an
"inverse" to git commit -a, meaning that

	1) the sequence
		git reset HEAD^
		git commit -a
	   would be a no-op, in the sense that the new commit would
	   get the same changes as the old one, and
	2) the sequence
		git commit -a
		git reset HEAD^
	   would be a no-op, in the sense that "git diff" would report
	   the same diff before and after.

But there isn't, and explaining how --soft and --mixed actually work
requires referring to the index file.

Is that something that can be fixed in the tools or does the user
fundamentally need to know about the index file to do this kind of
stuff?

--b.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
@ 2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
                                 ` (2 more replies)
  2006-01-31 23:16             ` Daniel Barkalow
  2 siblings, 3 replies; 84+ messages in thread
From: Linus Torvalds @ 2006-01-31 17:30 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Junio C Hamano, Keith Packard, Martin Langhoff,
	Git Mailing List



On Tue, 31 Jan 2006, Johannes Schindelin wrote:
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.

I really think you should explain it one of two ways:

 - ignore it. Never _ever_ use git-update-index directly, and don't tell 
   people about use individual filenames to git-commit. Maybe even add 
   "-a" by default to the git-commit flags as a special installation 
   addition.

 - talk about the index, and revel in it as a way to explain the staging 
   area. This is what the old tutorial.txt did before it got simplified.

The "ignore the index" approach is the simple one to explain. It's 
strictly less powerful, but hey, what else is new? 

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 15:24             ` Carl Baldwin
@ 2006-01-31 15:31               ` Johannes Schindelin
  0 siblings, 0 replies; 84+ messages in thread
From: Johannes Schindelin @ 2006-01-31 15:31 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Hi,

On Tue, 31 Jan 2006, Carl Baldwin wrote:

> Its difficult to explain because it breaks away from the precedent set
> by other SCMs.  I wouldn't call it a show-stopper for this reason.

I don't.

The strange concept from the user's perspective is that

	git commit -m "some message" file-a.txt

can commit file-b.txt also.

> [...] In other circumstances I simply bypass it by adding -a to the 
> command-line.

This is a different thing.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
@ 2006-01-31 15:24             ` Carl Baldwin
  2006-01-31 15:31               ` Johannes Schindelin
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 23:16             ` Daniel Barkalow
  2 siblings, 1 reply; 84+ messages in thread
From: Carl Baldwin @ 2006-01-31 15:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Its difficult to explain because it breaks away from the precedent set
by other SCMs.  I wouldn't call it a show-stopper for this reason.  In
fact, some who have wrapped their heads around the concept might call it
a valuable feature.  I, myself, have found it a handy thing in certain
circumstances.  In other circumstances I simply bypass it by adding -a
to the command-line.

This doesn't fit my definition of a show-stopper.

Carl

On Tue, Jan 31, 2006 at 11:27:34AM +0100, Johannes Schindelin wrote:
> Hi,
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.
> 
> Ciao,
> Dscho
> 
> 

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 18:58         ` Carl Baldwin
@ 2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
                               ` (2 more replies)
  2006-02-01 19:32           ` H. Peter Anvin
  1 sibling, 3 replies; 84+ messages in thread
From: Johannes Schindelin @ 2006-01-31 10:27 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Hi,

On Mon, 30 Jan 2006, Carl Baldwin wrote:

> In general, I think it is grasping the reason for the index file and how 
> git commands like git-commit and git-diff interact with it.

IMHO this is the one big showstopper. I had problems explaining the 
concept myself.

For example, I had a hard time explaining to a friend why a git-add'ed 
file is committed when saying "git commit some_other_file", but not 
another (modified) file. Very unintuitive.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:12             ` Greg KH
@ 2006-01-30 22:51             ` Alex Riesen
  2006-01-31 21:25               ` Linus Torvalds
  1 sibling, 1 reply; 84+ messages in thread
From: Alex Riesen @ 2006-01-30 22:51 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

Radoslaw Szkodzinski, Sun, Jan 29, 2006 12:18:45 +0100:
> > Fortunately, there are very few people involved with any specific piece
> > of the X.org distribution; there's really only one or two people
> > actively developing the X.org core server, so that part of the migration
> > will be easy. Our users will be stuck, but there aren't many of them
> > either, and git makes just sucking the current bits pretty easy. 
> 
> Not under Windows (bleh), but it's support for Cygwin is getting better
> and better.
> 

I use git in cygwin for a project with more then 17k files (almost 6M lines).
It's real slow on ntfs (on 3.2Mhz PIV!), PITA on fat, and has some hiccups now
and then (of the kind: "windows unexpectedly does not have feature X, which
everything else has" or "windows broke a 20-year-old feature Y").

But its more intuitive and more powerful than any alternatives here (Perforce,
SVN and CVS come to mind).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
                           ` (2 preceding siblings ...)
  2006-01-29 18:37         ` Dave Jones
@ 2006-01-30 18:58         ` Carl Baldwin
  2006-01-31 10:27           ` Johannes Schindelin
  2006-02-01 19:32           ` H. Peter Anvin
  3 siblings, 2 replies; 84+ messages in thread
From: Carl Baldwin @ 2006-01-30 18:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Keith Packard, Martin Langhoff, Linus Torvalds, Git Mailing List

Junio,

You don't seem to give git enough credit.  I am a hardware engineer with
many softwareish responsibilities.  One of those is to keep up to date
with the many commercial and free SCM type tools that are available.

Git has become my SCM tool of choice for many reasons.

- Anyone can install and fire it up without license/contract hassles.

- The infrastructure barriers to getting a project started with git are
  about as low as they can be.

- Geographically distributed teams even inside a corporation are
  becoming more common.  Git's repository design meets this need
  perfectly.

- The repository is also to designed to be inherently safe from
  data-loss and corruption even in the face of concurrent writes due to
  each objects' immutable nature.

- While on the subject of the repository.  Good job keeping it simple.
  I was able to learn pretty much all there is to know from a technical
  stand-point about the objects and refs directories in an afternoon.
  It follows a principle I always work toward myself.  "Make it simple
  enough that there are obviously no difficiencies rather than making it
  complicated so that there are no obvious difficiencies."

- In my opinion git is flexible enough to support just about any
  development/build/release flow that one can think of.  Most of the
  free tools (including subversion and arch) make branching and merging
  --- on which most of these flows rely --- way too heavy-weight.  Git
  shows how light-weight it can be.

  Not only can parallel development happen easily between
  users/repositories but parallel development is trivial even within the
  same repository.  I  think your 'pu' system illustrates how powerful
  it can be.  I myself have had up to four concurrent branches where I
  implemented four different features in parallel in the same repository
  easily switching between them.  It was almost too easy to bring them
  together using merge as each one finished.

  I was just reading through an article on how to choose an SCM last
  week and I kept thinking how git could be used to meet almost every
  one (if not all) of the needs discussed.

- Git supports enough network protocols to make it immediately useful in
  about any situation with firewalls and such.  This is where it leaves
  monotone behind.

The biggest hurdle that I've seen in adopting git is training the users.
I myself took to it like a duck to water but I've found that even some
of my brightest colleages have trouble wrapping their heads around it.
Currently, I'm trying to look at what parts they are having the most
trouble with.  In general, I think it is grasping the reason for the
index file and how git commands like git-commit and git-diff interact
with it.

Even so, I've always appreciated those tools that may have a steeper
learning curve but that pay dividends over time.  Also, I should mention
that this learning curve has been flattening over time as git has
developed and obtained more porcelainish commands.

Carl

On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
> Keith Packard <keithp@keithp.com> writes:
> 
> Wow.......  You are switching Cairo and X.org from CVS to git?
> 
> It could be that anything is better than CVS these days, but I
> have to admit that my jaw dropped after reading this, primarily
> because I've have never touched anything as big as X.
> 
> Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
> more faith in git.  Earlier I heard Wine folks are running git
> in parallel with CVS as their dual primary SCM now, and of
> course git is the primary SCM for the Linux kernel project.

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 20:17           ` Daniel Barkalow
  2006-01-29 20:29             ` Martin Langhoff
@ 2006-01-30 15:23             ` Mike McCormack
  1 sibling, 0 replies; 84+ messages in thread
From: Mike McCormack @ 2006-01-30 15:23 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Dave Jones, Junio C Hamano, Keith Packard, Martin Langhoff,
	Linus Torvalds, Git Mailing List


Daniel Barkalow wrote:
> I think we'll see a lot more adoption when we have a CVS daemon interface 
> (so projects can stop having a CVS repository, and support both sorts of 
> users with a git repository and have better metadata), and also if someone 
> sets up a place for putting git imports of CVS projects, so people will 
> know that other people are using git.

The Wine project is using a GIT repository which is mirrored into CVS. 
Alexandre wrote scripts to mirror GIT commits into CVS, so developers 
can use whichever they're more comfortable with, and the CVS repository 
remains up to date.

We've found that patch submitters using GIT tend to send multiple 
patches per day, and that those using CVS tend to send a patch or two 
occasionally or just keep up to date with the source.

Mike

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 20:17           ` Daniel Barkalow
@ 2006-01-29 20:29             ` Martin Langhoff
  2006-01-30 15:23             ` Mike McCormack
  1 sibling, 0 replies; 84+ messages in thread
From: Martin Langhoff @ 2006-01-29 20:29 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Dave Jones, Junio C Hamano, Keith Packard, Linus Torvalds,
	Git Mailing List

On 1/30/06, Daniel Barkalow <barkalow@iabervon.org> wrote:
> > There's also another git usage that I doubt I'm alone in doing.
> > I regularly use git to import cvs trees from sourceforge etc for
> > random projects, because I now find browsing history of projects
> > with tools like gitk much nicer than any cvs tool I've used.
> > (cvs annotate is the only thing I really miss).
>
> I think this is the real driving factor for git adoption: it doesn't have
> to be 10x better for people to use it, because individuals can use it for
> interacting with CVS projects without causing anybody else any pain.

IMHO, this is a killer feature of GIT. From a CVS/SVN user point of
view, it has vendor branches done right. At work, we do that with
Moodle, Elgg, EPrints and GForge. And the list is growing. That's why
I'm working on the toolchain to make interop with CVS smooth so I can
land patches in  upstream projects where I have cvs access.

cheers,


m

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 18:37         ` Dave Jones
@ 2006-01-29 20:17           ` Daniel Barkalow
  2006-01-29 20:29             ` Martin Langhoff
  2006-01-30 15:23             ` Mike McCormack
  0 siblings, 2 replies; 84+ messages in thread
From: Daniel Barkalow @ 2006-01-29 20:17 UTC (permalink / raw)
  To: Dave Jones
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

On Sun, 29 Jan 2006, Dave Jones wrote:

> On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
>  > Can I hear experiences from other big projects that tried to use
>  > git [*1*]?  I suspect there are many that have tried, and I
>  > would not be surprised at all if git did not work out well for
>  > them.  For projects that already run on a (free) SCM, I would be
>  > very surprised if the developers find the current git 10 times
>  > better than the SCM they have been using (probably with an
>  > exception of CVS), unless they have very specific need, such as
>  > parallel development of distributed nature like the Linux
>  > kernel.
> 
> I've found switching from cvs->git even for small projects has
> made me more productive.  In part because it's got me away from
> the 'check in to a centralised server like sourceforge' mentality,
> without the need to set up a local cvs server of my own.
> Adding changesets to a small project like x86info, now takes
> seconds, whereas it used to take minutes of thumb-twiddling whilst
> I waited for sf.net to do its thing.   The ability to check in
> changesets locally whilst I'm travelling, and then push them when
> I have network connectivity again is also a massive productivity
> win over cvs.
> 
> There's also another git usage that I doubt I'm alone in doing.
> I regularly use git to import cvs trees from sourceforge etc for
> random projects, because I now find browsing history of projects
> with tools like gitk much nicer than any cvs tool I've used.
> (cvs annotate is the only thing I really miss).

I think this is the real driving factor for git adoption: it doesn't have 
to be 10x better for people to use it, because individuals can use it for 
interacting with CVS projects without causing anybody else any pain. It 
doesn't just enable distributed development, it enables a distributed 
choice of SCM, which means a much lower activation energy threshold. I 
think we'll see a lot more adoption when we have a CVS daemon interface 
(so projects can stop having a CVS repository, and support both sorts of 
users with a git repository and have better metadata), and also if someone 
sets up a place for putting git imports of CVS projects, so people will 
know that other people are using git.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 14:19             ` Morten Welinder
@ 2006-01-29 20:15               ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-01-29 20:15 UTC (permalink / raw)
  To: Morten Welinder; +Cc: git

Morten Welinder <mwelinder@gmail.com> writes:

> If I understand this right, that means that for a log file (in this
> case a ChangeLog file) that is appended to linearly as a
> function of revision number, we have...
>
> cvs: O(n) archive size
> git: O(n*n) archive size
>
> At least that is what we get if revision N is always deltad over
> revision N-1.  A good deal could be saved if instead of dumping
> a full copy every 10 revisions, that revision would instead be
> deltad off an earlier revision, but I think it'll still be O(n*n).

I have not counted O()rders, but it is not as simple as that,
because we do not really compare "versions".  If version N
reverts a change version N-1 introduced since version N-2, we
would not even store a copy for version N and version N-2
separately.  We just store a single copy, which may be delta
information against version N-1 (or the other way around and N-1
might be delta against N).

For the sake of math, let's say this project keeps only one
file, append only ChangeLog, with a straight line of development
without branches ("single strand of pearls"), and has revisions
1..N.

In RCS, you would have a full copy of the revision N, and
revision J is recorded as delta from revision J+1 for 1 <= J < N.
This delta is similar to "ed" script, and going backwards in the
history for the ChangeLog example means only line deletion is
involved, so what was removed is not recorded.  It records how
many lines are removed from where.  This is _very_ efficient and
compact.

In git, we would have a full copy of version N (because we favor
keeping larger blob associated with newer commits as a full
copy), and essentially the same thing as RCS happens.  The only
difference is that our "delta" is binary delta, but in this
case, it just records "copy N bytes from here to here" which
results in about the same amount of information to represent
each delta.  As you say, if (10 < N), we would have a full copy
every once in a while.  You could use depth other than the
default to make this chaining longer and if you did so, your
repository would be *very* compactly compressed.

However, retrieving cost of version 1 is quite different.  RCS
format is O(n) -- you start from the tip, extract and interpret
(N-1) deltas and apply them in turn to get to what you want.

The cost of extracting an arbitrary version is bounded in git
packfile, because you need to do such an "extract, interpret and
apply" at most $depth cycles.  This is primarily because we do
not store "versions" but individual objects, and do not apply
"newer revisions are far more likely to be accessed often"
heuristics, which RCS format is designed for.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
  2006-01-29 10:09         ` Keith Packard
@ 2006-01-29 18:37         ` Dave Jones
  2006-01-29 20:17           ` Daniel Barkalow
  2006-01-30 18:58         ` Carl Baldwin
  3 siblings, 1 reply; 84+ messages in thread
From: Dave Jones @ 2006-01-29 18:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Keith Packard, Martin Langhoff, Linus Torvalds, Git Mailing List

On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
 > Can I hear experiences from other big projects that tried to use
 > git [*1*]?  I suspect there are many that have tried, and I
 > would not be surprised at all if git did not work out well for
 > them.  For projects that already run on a (free) SCM, I would be
 > very surprised if the developers find the current git 10 times
 > better than the SCM they have been using (probably with an
 > exception of CVS), unless they have very specific need, such as
 > parallel development of distributed nature like the Linux
 > kernel.

I've found switching from cvs->git even for small projects has
made me more productive.  In part because it's got me away from
the 'check in to a centralised server like sourceforge' mentality,
without the need to set up a local cvs server of my own.
Adding changesets to a small project like x86info, now takes
seconds, whereas it used to take minutes of thumb-twiddling whilst
I waited for sf.net to do its thing.   The ability to check in
changesets locally whilst I'm travelling, and then push them when
I have network connectivity again is also a massive productivity
win over cvs.

There's also another git usage that I doubt I'm alone in doing.
I regularly use git to import cvs trees from sourceforge etc for
random projects, because I now find browsing history of projects
with tools like gitk much nicer than any cvs tool I've used.
(cvs annotate is the only thing I really miss).

What would be really cool, would be a web page pointing to public
conversions of various projects cvs trees, so that everyone doesn't
have to keep hammering various repos to do the conversions themselves.
(Sort of a pseudo bkbits.net).

		Dave

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 11:18           ` Radoslaw Szkodzinski
@ 2006-01-29 18:12             ` Greg KH
  2006-01-31 18:33               ` Radoslaw Szkodzinski
  2006-01-30 22:51             ` Alex Riesen
  1 sibling, 1 reply; 84+ messages in thread
From: Greg KH @ 2006-01-29 18:12 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

On Sun, Jan 29, 2006 at 12:18:45PM +0100, Radoslaw Szkodzinski wrote:
> 
> The only drawback is local cloning. This operation is like 4x slower
> than plain copying of the repository. Probably because it works like an
> ssh clone - creates a pack, copies it, then unpacks. This is just
> inefficient on a local machine.

Have you tried the "-l" option for cloneing locally?  It's _very_ fast,
even for my tiny little old laptop.

If you add a "-n" that will not checkout the source tree, so you can
compare the time of cloning with the checkout portion.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29  3:53           ` Junio C Hamano
@ 2006-01-29 14:19             ` Morten Welinder
  2006-01-29 20:15               ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Morten Welinder @ 2006-01-29 14:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> I think that 40% sounds about right.  My understanding of the
> underlying format CVS uses, RCS, is that it stores an full copy
> of the tip of trunk uncompressed, and other versions of the file
> are represented as incremental delta from that.  The packed git
> format does not favor particular version based on the distance
> from the tip, and stores either a compressed full copy, or a
> delta from some other revision (which may not necessarily be
> represented as a full copy).  When we store something as a delta
> from something else, we limit the length of the delta chain to a
> full copy to 10 (by default), so that you can get to a specific
> object with at most 10 applications of delta on top of a full
> copy.

If I understand this right, that means that for a log file (in this
case a ChangeLog file) that is appended to linearly as a
function of revision number, we have...

cvs: O(n) archive size
git: O(n*n) archive size

At least that is what we get if revision N is always deltad over
revision N-1.  A good deal could be saved if instead of dumping
a full copy every 10 revisions, that revision would instead be
deltad off an earlier revision, but I think it'll still be O(n*n).

(/me prepares for Linus chiming in and telling me I should not
keep ChangeLog files, :-)

M.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 10:09         ` Keith Packard
@ 2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:12             ` Greg KH
  2006-01-30 22:51             ` Alex Riesen
  0 siblings, 2 replies; 84+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-29 11:18 UTC (permalink / raw)
  To: Keith Packard
  Cc: Junio C Hamano, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

Keith Packard wrote:
> Fortunately, there are very few people involved with any specific piece
> of the X.org distribution; there's really only one or two people
> actively developing the X.org core server, so that part of the migration
> will be easy. Our users will be stuck, but there aren't many of them
> either, and git makes just sucking the current bits pretty easy. 
>  

Not under Windows (bleh), but it's support for Cygwin is getting better
and better.

> I don't know of other huge projects moving to git; it's not all that
> interesting as we know the tool is stable and will scale to support our
> project already. Also, hg and bzr are not ready for production use in my
> opinion; hg as it appears likely a flag day will be required before 1.0,

I haven't seen any such flag day since 0.3. Repository format seems
stable, except rename and modes support (these will be added in a
compatible way I think).
0.8 release is imminent (today or tomorrow).

I personally wouldn't mind git - it's great.

The only drawback is local cloning. This operation is like 4x slower
than plain copying of the repository. Probably because it works like an
ssh clone - creates a pack, copies it, then unpacks. This is just
inefficient on a local machine.

> and bzr because they didn't focus on repository format, and have
> suggested that they will switch to a hash-addressed scheme at some point
> in the future...
>   

Not only that - they don't have an efficient network transfer protocol.
(they use HTTP walkers, not even supporting persistent connections and
also do too many DNS lookups)
This is very unfortunate, especially for large projects.
(branching Linux would take 3 days I think)

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
@ 2006-01-29 10:09         ` Keith Packard
  2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:37         ` Dave Jones
  2006-01-30 18:58         ` Carl Baldwin
  3 siblings, 1 reply; 84+ messages in thread
From: Keith Packard @ 2006-01-29 10:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: cworth, keithp, Martin Langhoff, Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4034 bytes --]

On Sat, 2006-01-28 at 13:08 -0800, Junio C Hamano wrote:
> Keith Packard <keithp@keithp.com> writes:
> 
> > Once we're happy with the import, I'm pretty sure we'll just switch
> > cairo over to git and dump the CVS bits. X.org is a harder case, for
> > that I suspect we'll migrate individual modules over one at a time,
> > perhaps starting with the core X server pieces so that I can get my work
> > done, have it published in the main repository and not have it also
> > break everyone else's X server.
> 
> Wow.......  You are switching Cairo and X.org from CVS to git?

We're not switching 'X.org', we're switching the X server core. X.org is
now broken into many separate projects, and each one will get to choose
SCM on their own. I expect to migrate the ones I maintain and use to
git, but migration of the dead code is unlikely to ever happen (and
there's lots of dead code) 

> It could be that anything is better than CVS these days, but I
> have to admit that my jaw dropped after reading this, primarily
> because I've have never touched anything as big as X.
> 
> Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
> more faith in git.  Earlier I heard Wine folks are running git
> in parallel with CVS as their dual primary SCM now, and of
> course git is the primary SCM for the Linux kernel project.
> 
> For things like the source code management, it takes a new
> software to be at least 10 times as good as the one that has
> been used, because switching _is_ a pain no matter how well tool
> helps the transition.  You have to transition not just the
> repository, but people who interact with it.

Fortunately, there are very few people involved with any specific piece
of the X.org distribution; there's really only one or two people
actively developing the X.org core server, so that part of the migration
will be easy. Our users will be stuck, but there aren't many of them
either, and git makes just sucking the current bits pretty easy. 
 
> When the Linux kernel switched, it was not that hard to be
> infinitely better than the previous one.  Because the previous
> one was no longer available to the kernel community; git did not
> have to be 10 times better on technical merits alone when the
> transition happened.

git really does look 10x better than CVS at this point; mostly social
issues are now blocking X development as weaker developers are refused
access to source code management to protect the project from damage. git
eliminates that barrier, and should let many new developers experiment
and share their results without affecting my work

> Can I hear experiences from other big projects that tried to use
> git [*1*]?  I suspect there are many that have tried, and I
> would not be surprised at all if git did not work out well for
> them.  For projects that already run on a (free) SCM, I would be
> very surprised if the developers find the current git 10 times
> better than the SCM they have been using (probably with an
> exception of CVS), unless they have very specific need, such as
> parallel development of distributed nature like the Linux
> kernel.

Everyone *wants* parallel distributed development, CVS prevents it.
And, remember that this is *not* a huge project, the core X server is
only 2M lines of source code. We separate out all of the drivers,
libraries and applications. Doing the migration in pieces allows us to
incrementally affect developers, and repair issues without suspending
all development.

I don't know of other huge projects moving to git; it's not all that
interesting as we know the tool is stable and will scale to support our
project already. Also, hg and bzr are not ready for production use in my
opinion; hg as it appears likely a flag day will be required before 1.0,
and bzr because they didn't focus on repository format, and have
suggested that they will switch to a hash-addressed scheme at some point
in the future...
  
-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29  2:14         ` Morten Welinder
@ 2006-01-29  3:53           ` Junio C Hamano
  2006-01-29 14:19             ` Morten Welinder
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2006-01-29  3:53 UTC (permalink / raw)
  To: Morten Welinder; +Cc: git

Morten Welinder <mwelinder@gmail.com> writes:

>> Can I hear experiences from other big projects that tried to use
>> git [*1*]?  I suspect there are many that have tried, and I
>> would not be surprised at all if git did not work out well for
>> them.
>
> I've been playing with Gnumeric under git.
> ...
> We haven't switched yet, but I expect that we will...

I might have sounded as if I was looking for failure report, but
success stories are of course welcome ;-).  It's always good to
hear their git experiences first-hand from people in the top
echelon of public projects.

> 270M is about 40% of the cvs repository size.  Given
> compression I would have expected bigger savings.

I think that 40% sounds about right.  My understanding of the
underlying format CVS uses, RCS, is that it stores an full copy
of the tip of trunk uncompressed, and other versions of the file
are represented as incremental delta from that.  The packed git
format does not favor particular version based on the distance
from the tip, and stores either a compressed full copy, or a
delta from some other revision (which may not necessarily be
represented as a full copy).  When we store something as a delta
from something else, we limit the length of the delta chain to a
full copy to 10 (by default), so that you can get to a specific
object with at most 10 applications of delta on top of a full
copy.

Comparing these two formats for storage efficiency is tricky:

 - A full copy of the version at the tip in CVS is not
   compressed but in git a full copy is compressed -- zlib gives
   50% for typical text sources -- git has some advantage here.

 - Because of delta-length limit, we store full copy, albeit
   compressed [*1*], every ten or so versions.  This trades off
   storage effciency for run-time efficiency.

 - CVS storage records most things as delta for a long-lived
   project, and delta are less compressible (IOW, you could
   think of them as already compressed somewhat), so it is not
   _that_ inefficient to begin with.

 - Delta representation is used only when representing something
   as a delta from something else buys as enough space reduction
   than compressing it as a full copy in git.  This is a pure
   improvement from the CVS format.

[Footnote]

*1* You could make different trade-off by using --depth flag
when running git-pack-objects.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
@ 2006-01-29  2:14         ` Morten Welinder
  2006-01-29  3:53           ` Junio C Hamano
  2006-01-29 10:09         ` Keith Packard
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 84+ messages in thread
From: Morten Welinder @ 2006-01-29  2:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

> Can I hear experiences from other big projects that tried to use
> git [*1*]?  I suspect there are many that have tried, and I
> would not be surprised at all if git did not work out well for
> them.

I've been playing with Gnumeric under git.

-rw-rw-r--    1 welinder research     270M Nov  5 09:46
gnumeric/.git/objects/pack/pack-91291de5477ddd06545b052460239b3dae89ad72.pack

270M is about 40% of the cvs repository size.  Given
compression I would have expected bigger savings.

Conversion isn't perfect, probably because the cvs tree has
seen some hacking over the years.  (I am not posting the URL
because I don't want to kill gnome.org.)

We haven't switched yet, but I expect that we will.  We are
looking for (in no particular order):

1. Offline history.
2. Patch sets and other things that'll make it easier to maintain
    more than one branch.

In other words, pretty-much anything but cvs will fit the bill, :-./

M.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [Census] So who uses git?
  2006-01-28 11:00     ` Keith Packard
@ 2006-01-28 21:08       ` Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
                           ` (3 more replies)
  0 siblings, 4 replies; 84+ messages in thread
From: Junio C Hamano @ 2006-01-28 21:08 UTC (permalink / raw)
  To: Keith Packard; +Cc: Martin Langhoff, Linus Torvalds, Git Mailing List

Keith Packard <keithp@keithp.com> writes:

> Once we're happy with the import, I'm pretty sure we'll just switch
> cairo over to git and dump the CVS bits. X.org is a harder case, for
> that I suspect we'll migrate individual modules over one at a time,
> perhaps starting with the core X server pieces so that I can get my work
> done, have it published in the main repository and not have it also
> break everyone else's X server.

Wow.......  You are switching Cairo and X.org from CVS to git?

It could be that anything is better than CVS these days, but I
have to admit that my jaw dropped after reading this, primarily
because I've have never touched anything as big as X.

Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
more faith in git.  Earlier I heard Wine folks are running git
in parallel with CVS as their dual primary SCM now, and of
course git is the primary SCM for the Linux kernel project.

For things like the source code management, it takes a new
software to be at least 10 times as good as the one that has
been used, because switching _is_ a pain no matter how well tool
helps the transition.  You have to transition not just the
repository, but people who interact with it.

When the Linux kernel switched, it was not that hard to be
infinitely better than the previous one.  Because the previous
one was no longer available to the kernel community; git did not
have to be 10 times better on technical merits alone when the
transition happened.

Can I hear experiences from other big projects that tried to use
git [*1*]?  I suspect there are many that have tried, and I
would not be surprised at all if git did not work out well for
them.  For projects that already run on a (free) SCM, I would be
very surprised if the developers find the current git 10 times
better than the SCM they have been using (probably with an
exception of CVS), unless they have very specific need, such as
parallel development of distributed nature like the Linux
kernel.

I do not do mailing list search as often as I would like to be
doing, but I have seen some projects tried and went back to CVS.
We would learn much from our failures to support them -- what
those people found lacking.


[Foornote]

*1* Please limit yourselves to reasonably well-known "it is
surprising you haven't heard of this project" kind...

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2006-02-06 21:15 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-01  7:08 [Census] So who uses git? linux
2006-02-01  8:51 ` Junio C Hamano
2006-02-01 16:04 ` Linus Torvalds
2006-02-01 16:10 ` Alex Riesen
2006-02-01 21:27   ` linux
  -- strict thread matches above, loose matches on Subject: below --
2006-01-26  2:10 LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors) Martin Langhoff
2006-01-28  4:47 ` Linus Torvalds
2006-01-28  5:33   ` Martin Langhoff
2006-01-28 11:00     ` Keith Packard
2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
2006-01-29  2:14         ` Morten Welinder
2006-01-29  3:53           ` Junio C Hamano
2006-01-29 14:19             ` Morten Welinder
2006-01-29 20:15               ` Junio C Hamano
2006-01-29 10:09         ` Keith Packard
2006-01-29 11:18           ` Radoslaw Szkodzinski
2006-01-29 18:12             ` Greg KH
2006-01-31 18:33               ` Radoslaw Szkodzinski
2006-01-31 19:50                 ` Radoslaw Szkodzinski
2006-01-31 20:43                   ` Junio C Hamano
2006-01-31 21:02                     ` Radoslaw Szkodzinski
2006-01-30 22:51             ` Alex Riesen
2006-01-31 21:25               ` Linus Torvalds
2006-01-31 21:52                 ` J. Bruce Fields
2006-01-31 22:01                 ` Alex Riesen
     [not found]                   ` <20060201013901.GA16832@mail.com>
2006-02-01  2:04                     ` Linus Torvalds
2006-02-01  2:09                       ` Linus Torvalds
2006-02-01  2:31                       ` Junio C Hamano
2006-02-01  3:43                         ` Linus Torvalds
2006-02-01  7:03                           ` Junio C Hamano
     [not found]                         ` <20060201045337.GC25753@mail.com>
2006-02-01  5:04                           ` Linus Torvalds
2006-02-01  5:42                           ` Junio C Hamano
2006-02-01 16:15                       ` Jason Riedy
2006-02-01 19:20                       ` Julian Phillips
2006-02-01 19:29                         ` Linus Torvalds
2006-02-06 21:15                       ` Chuck Lever
2006-02-01  2:52                     ` Martin Langhoff
2006-02-01  3:48                       ` Linus Torvalds
2006-02-01 19:30                         ` H. Peter Anvin
2006-02-01 14:55                       ` Alex Riesen
2006-02-01 16:25                         ` Linus Torvalds
2006-02-02  9:12                           ` Alex Riesen
2006-01-29 18:37         ` Dave Jones
2006-01-29 20:17           ` Daniel Barkalow
2006-01-29 20:29             ` Martin Langhoff
2006-01-30 15:23             ` Mike McCormack
2006-01-30 18:58         ` Carl Baldwin
2006-01-31 10:27           ` Johannes Schindelin
2006-01-31 15:24             ` Carl Baldwin
2006-01-31 15:31               ` Johannes Schindelin
2006-01-31 17:30             ` Linus Torvalds
2006-01-31 18:12               ` J. Bruce Fields
2006-01-31 19:33                 ` Junio C Hamano
2006-01-31 19:44                   ` Jon Loeliger
2006-01-31 19:52                     ` Junio C Hamano
     [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
2006-01-31 20:56                       ` J. Bruce Fields
2006-01-31 20:06                   ` J. Bruce Fields
2006-01-31 19:01               ` Keith Packard
2006-01-31 19:21                 ` Linus Torvalds
2006-01-31 22:55                   ` Joel Becker
2006-02-01 14:43                     ` Johannes Schindelin
2006-01-31 20:56                 ` Sam Ravnborg
2006-01-31 22:21                   ` Junio C Hamano
2006-02-01 19:34               ` H. Peter Anvin
2006-01-31 23:16             ` Daniel Barkalow
2006-01-31 23:36               ` Petr Baudis
2006-01-31 23:47               ` Junio C Hamano
2006-02-01  0:38                 ` Linus Torvalds
2006-02-01  0:52                   ` Junio C Hamano
2006-02-01  2:19                   ` Daniel Barkalow
2006-02-01  6:42                   ` Junio C Hamano
2006-02-01  7:22                     ` Carl Worth
2006-02-01  8:26                       ` Junio C Hamano
2006-02-01  9:59                         ` Randal L. Schwartz
2006-02-01 20:48                           ` Junio C Hamano
2006-02-01 17:11                     ` Linus Torvalds
2006-02-01 17:18                     ` Nicolas Pitre
2006-02-01 20:27                       ` Junio C Hamano
2006-02-01 21:09                         ` Linus Torvalds
2006-02-01 21:34                           ` Nicolas Pitre
2006-02-01 21:59                           ` Junio C Hamano
2006-02-01 22:25                             ` Nicolas Pitre
2006-02-01 22:50                               ` Junio C Hamano
2006-02-02 14:59                                 ` Andreas Ericsson
2006-02-01 22:35                             ` Linus Torvalds
2006-02-01 22:57                             ` Daniel Barkalow
2006-02-01 22:00                         ` Joel Becker
2006-02-01 19:32           ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.