LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)

All of lore.kernel.org
 help / color / mirror / Atom feed

* LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
@ 2006-01-26  2:10 Martin Langhoff
  2006-01-28  4:47 ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Martin Langhoff @ 2006-01-26  2:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On 1/26/06, Linus Torvalds <torvalds@osdl.org> wrote:

> If we get an error parsing the arguments, exit.

This bug found thanks to the 'demo' effect. ;-)

The workshop had a 2hr slot -- after 2hs 15, I asked Linus if he
wanted to talk about the internals. He did, and the workshop went
on... for 2 hours more. It was actually hard to get people out of the
room.

Sadly, not many people actually played along on their laptop. Those
who did got an extra bit of help to migrate their preexisting CVS/SVN
repos ;-) (thanks to Sam Vilain for all the help!)

I'll upload the presentation material soon -- very similar to the
stuff I used @ Wellington Perl Mongers. Still text-based; given all
the talk about plumbing and porcelain, I steadfastly refuse to add
imagery.

During the presentation someone mentioned errors when running
git-cvsimport which I'm keen on hearing more about.

cheers,

m

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-26  2:10 LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors) Martin Langhoff
@ 2006-01-28  4:47 ` Linus Torvalds
  2006-01-28  5:33   ` Martin Langhoff
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-01-28  4:47 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, Git Mailing List



On Thu, 26 Jan 2006, Martin Langhoff wrote:
> 
> During the presentation someone mentioned errors when running
> git-cvsimport which I'm keen on hearing more about.

Martin, I talked to Keith, and apparently you fixed some cvsimport problem 
they had with Cairo during dinner last night? Was that something that 
could have affected other people, or was it very specific to whatever 
Cairo CVS insanity? I've not seen any messages from you on it..

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-28  4:47 ` Linus Torvalds
@ 2006-01-28  5:33   ` Martin Langhoff
  2006-01-28  5:53     ` Linus Torvalds
  2006-01-28 11:00     ` Keith Packard
  0 siblings, 2 replies; 110+ messages in thread
From: Martin Langhoff @ 2006-01-28  5:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, keithp

On 1/28/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > During the presentation someone mentioned errors when running
> > git-cvsimport which I'm keen on hearing more about.
>
> Martin, I talked to Keith, and apparently you fixed some cvsimport problem
> they had with Cairo during dinner last night? Was that something that
> could have affected other people, or was it very specific to whatever
> Cairo CVS insanity? I've not seen any messages from you on it..

I've got a few small improvements to cvsimport in my laptop that I'll
push out for Junio to merge as soon as I get back to the office. I've
run "99% successful" imports of cairo and of x.org (modular and
monolithic) with all their branches and tags. It isn't literally the
20 years of commits Jim talked initially about -- cvs holds just the
last ~5 years.

The repos *are* a bit broken -- files missing (not moved, but really
missing) so some of the fixes are to make it easier to discover where
it is dying and workaround it. There are a few more things that I need
to debug in cvsimport -- there's a small delta between what I should
have and what I do have. As soon as they are 100% right I'll put them
on http://locke.catalyst.net.nz/gitweb for the X.org team to have a
look at them -- and a cronjob to keep them up to date with official
CVS.

BTW, have you still got that patch to git-merge to seed the commit msg
with conflicted files? ;-)

cheers,

m

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-28  5:33   ` Martin Langhoff
@ 2006-01-28  5:53     ` Linus Torvalds
  2006-01-28  6:32       ` Junio C Hamano
  2006-01-29 10:12       ` Fredrik Kuivinen
  2006-01-28 11:00     ` Keith Packard
  1 sibling, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-01-28  5:53 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, Git Mailing List, keithp



On Sat, 28 Jan 2006, Martin Langhoff wrote:
> 
> BTW, have you still got that patch to git-merge to seed the commit msg
> with conflicted files? ;-)

Nope. But it was something like the appended (totally untested, and 
slightly improved).

The point being that we'd fill in a template that the committer will 
hopefully edit to explain what he did to fix up the merge for each file 
that had conflicts.

		Linus

---
diff --git a/git-merge.sh b/git-merge.sh
index 0a158ef..9f828f3 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -301,5 +301,9 @@ then
 	"Automatic merge went well; stopped before committing as requested"
 	exit 0
 else
+	echo >"$GIT_DIR/MERGE_MSG"
+	echo "Conflicts in" >"$GIT_DIR/MERGE_MSG"
+	git-ls-files --unmerged | cut -f2 | uniq |
+		sed 's/^.*/    \0:/' >"$GIT_DIR/MERGE_MSG"
 	die "Automatic merge failed; fix up by hand"
 fi

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-28  5:53     ` Linus Torvalds
@ 2006-01-28  6:32       ` Junio C Hamano
  2006-01-29 10:12       ` Fredrik Kuivinen
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-28  6:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, Git Mailing List, keithp

Linus Torvalds <torvalds@osdl.org> writes:

> The point being that we'd fill in a template that the committer will 
> hopefully edit to explain what he did to fix up the merge for each file 
> that had conflicts.

That is a sound idea from the point of view of good practice.

While on the topic of conflicting merge, I've been wondering if
it would make sense to do the "combined diff" between stage 2,
stage 3 and the working tree file, in addition to the --ours and
--theirs enhancements you added lately.

This would let you sanity check the merge you _could_ commit, in
the same format you would see later when you examine the merge
commit.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-28  5:33   ` Martin Langhoff
  2006-01-28  5:53     ` Linus Torvalds
@ 2006-01-28 11:00     ` Keith Packard
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Keith Packard @ 2006-01-28 11:00 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: keithp, Linus Torvalds, Junio C Hamano, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1757 bytes --]

On Sat, 2006-01-28 at 18:33 +1300, Martin Langhoff wrote:

> I've got a few small improvements to cvsimport in my laptop that I'll
> push out for Junio to merge as soon as I get back to the office. I've
> run "99% successful" imports of cairo and of x.org (modular and
> monolithic) with all their branches and tags. It isn't literally the
> 20 years of commits Jim talked initially about -- cvs holds just the
> last ~5 years.

Yeah, X CVS is a scattered mess at present. I think it would be better
to just leave that mess alone and grab a reasonably recent chunk of it
to put into a GIT repository. Save a bunch of space too. We also haven't
quite finished all of the recovery needed to span the whole twenty years
yet.

Carl and I hacked at the tool a bit to pull apart our ChangeLog-based
commit messages; extracting email addresses and separating the commit
messages from the (now useless) list of affected files.  

We're getting clean cairo imports now, there are a few weirdnesses
around branches that we've seen -- one commit appears on both the branch
and trunk for some reason.

Once we're happy with the import, I'm pretty sure we'll just switch
cairo over to git and dump the CVS bits. X.org is a harder case, for
that I suspect we'll migrate individual modules over one at a time,
perhaps starting with the core X server pieces so that I can get my work
done, have it published in the main repository and not have it also
break everyone else's X server.

I'm not sure we'll need ongoing synchronization with existing X.org CVS
for long; there aren't any other developers doing any significant
changes to this part of the system, so we can abandon the losers with no
remorse.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [Census] So who uses git?
  2006-01-28 11:00     ` Keith Packard
@ 2006-01-28 21:08       ` Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
                           ` (3 more replies)
  0 siblings, 4 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-28 21:08 UTC (permalink / raw)
  To: Keith Packard; +Cc: Martin Langhoff, Linus Torvalds, Git Mailing List

Keith Packard <keithp@keithp.com> writes:

> Once we're happy with the import, I'm pretty sure we'll just switch
> cairo over to git and dump the CVS bits. X.org is a harder case, for
> that I suspect we'll migrate individual modules over one at a time,
> perhaps starting with the core X server pieces so that I can get my work
> done, have it published in the main repository and not have it also
> break everyone else's X server.

Wow.......  You are switching Cairo and X.org from CVS to git?

It could be that anything is better than CVS these days, but I
have to admit that my jaw dropped after reading this, primarily
because I've have never touched anything as big as X.

Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
more faith in git.  Earlier I heard Wine folks are running git
in parallel with CVS as their dual primary SCM now, and of
course git is the primary SCM for the Linux kernel project.

For things like the source code management, it takes a new
software to be at least 10 times as good as the one that has
been used, because switching _is_ a pain no matter how well tool
helps the transition.  You have to transition not just the
repository, but people who interact with it.

When the Linux kernel switched, it was not that hard to be
infinitely better than the previous one.  Because the previous
one was no longer available to the kernel community; git did not
have to be 10 times better on technical merits alone when the
transition happened.

Can I hear experiences from other big projects that tried to use
git [*1*]?  I suspect there are many that have tried, and I
would not be surprised at all if git did not work out well for
them.  For projects that already run on a (free) SCM, I would be
very surprised if the developers find the current git 10 times
better than the SCM they have been using (probably with an
exception of CVS), unless they have very specific need, such as
parallel development of distributed nature like the Linux
kernel.

I do not do mailing list search as often as I would like to be
doing, but I have seen some projects tried and went back to CVS.
We would learn much from our failures to support them -- what
those people found lacking.

[Foornote]

*1* Please limit yourselves to reasonably well-known "it is
surprising you haven't heard of this project" kind...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
@ 2006-01-29  2:14         ` Morten Welinder
  2006-01-29  3:53           ` Junio C Hamano
  2006-01-29 10:09         ` Keith Packard
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 110+ messages in thread
From: Morten Welinder @ 2006-01-29  2:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

> Can I hear experiences from other big projects that tried to use
> git [*1*]?  I suspect there are many that have tried, and I
> would not be surprised at all if git did not work out well for
> them.

I've been playing with Gnumeric under git.

-rw-rw-r--    1 welinder research     270M Nov  5 09:46
gnumeric/.git/objects/pack/pack-91291de5477ddd06545b052460239b3dae89ad72.pack

270M is about 40% of the cvs repository size.  Given
compression I would have expected bigger savings.

Conversion isn't perfect, probably because the cvs tree has
seen some hacking over the years.  (I am not posting the URL
because I don't want to kill gnome.org.)

We haven't switched yet, but I expect that we will.  We are
looking for (in no particular order):

1. Offline history.
2. Patch sets and other things that'll make it easier to maintain
    more than one branch.

In other words, pretty-much anything but cvs will fit the bill, :-./

M.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29  2:14         ` Morten Welinder
@ 2006-01-29  3:53           ` Junio C Hamano
  2006-01-29 14:19             ` Morten Welinder
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-01-29  3:53 UTC (permalink / raw)
  To: Morten Welinder; +Cc: git

Morten Welinder <mwelinder@gmail.com> writes:

>> Can I hear experiences from other big projects that tried to use
>> git [*1*]?  I suspect there are many that have tried, and I
>> would not be surprised at all if git did not work out well for
>> them.
>
> I've been playing with Gnumeric under git.
> ...
> We haven't switched yet, but I expect that we will...

I might have sounded as if I was looking for failure report, but
success stories are of course welcome ;-).  It's always good to
hear their git experiences first-hand from people in the top
echelon of public projects.

> 270M is about 40% of the cvs repository size.  Given
> compression I would have expected bigger savings.

I think that 40% sounds about right.  My understanding of the
underlying format CVS uses, RCS, is that it stores an full copy
of the tip of trunk uncompressed, and other versions of the file
are represented as incremental delta from that.  The packed git
format does not favor particular version based on the distance
from the tip, and stores either a compressed full copy, or a
delta from some other revision (which may not necessarily be
represented as a full copy).  When we store something as a delta
from something else, we limit the length of the delta chain to a
full copy to 10 (by default), so that you can get to a specific
object with at most 10 applications of delta on top of a full
copy.

Comparing these two formats for storage efficiency is tricky:

 - A full copy of the version at the tip in CVS is not
   compressed but in git a full copy is compressed -- zlib gives
   50% for typical text sources -- git has some advantage here.

 - Because of delta-length limit, we store full copy, albeit
   compressed [*1*], every ten or so versions.  This trades off
   storage effciency for run-time efficiency.

 - CVS storage records most things as delta for a long-lived
   project, and delta are less compressible (IOW, you could
   think of them as already compressed somewhat), so it is not
   _that_ inefficient to begin with.

 - Delta representation is used only when representing something
   as a delta from something else buys as enough space reduction
   than compressing it as a full copy in git.  This is a pure
   improvement from the CVS format.

[Footnote]

*1* You could make different trade-off by using --depth flag
when running git-pack-objects.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
@ 2006-01-29 10:09         ` Keith Packard
  2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:37         ` Dave Jones
  2006-01-30 18:58         ` Carl Baldwin
  3 siblings, 1 reply; 110+ messages in thread
From: Keith Packard @ 2006-01-29 10:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: cworth, keithp, Martin Langhoff, Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4034 bytes --]

On Sat, 2006-01-28 at 13:08 -0800, Junio C Hamano wrote:
> Keith Packard <keithp@keithp.com> writes:
> 
> > Once we're happy with the import, I'm pretty sure we'll just switch
> > cairo over to git and dump the CVS bits. X.org is a harder case, for
> > that I suspect we'll migrate individual modules over one at a time,
> > perhaps starting with the core X server pieces so that I can get my work
> > done, have it published in the main repository and not have it also
> > break everyone else's X server.
> 
> Wow.......  You are switching Cairo and X.org from CVS to git?

We're not switching 'X.org', we're switching the X server core. X.org is
now broken into many separate projects, and each one will get to choose
SCM on their own. I expect to migrate the ones I maintain and use to
git, but migration of the dead code is unlikely to ever happen (and
there's lots of dead code) 

> It could be that anything is better than CVS these days, but I
> have to admit that my jaw dropped after reading this, primarily
> because I've have never touched anything as big as X.
> 
> Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
> more faith in git.  Earlier I heard Wine folks are running git
> in parallel with CVS as their dual primary SCM now, and of
> course git is the primary SCM for the Linux kernel project.
> 
> For things like the source code management, it takes a new
> software to be at least 10 times as good as the one that has
> been used, because switching _is_ a pain no matter how well tool
> helps the transition.  You have to transition not just the
> repository, but people who interact with it.

Fortunately, there are very few people involved with any specific piece
of the X.org distribution; there's really only one or two people
actively developing the X.org core server, so that part of the migration
will be easy. Our users will be stuck, but there aren't many of them
either, and git makes just sucking the current bits pretty easy. 

> When the Linux kernel switched, it was not that hard to be
> infinitely better than the previous one.  Because the previous
> one was no longer available to the kernel community; git did not
> have to be 10 times better on technical merits alone when the
> transition happened.

git really does look 10x better than CVS at this point; mostly social
issues are now blocking X development as weaker developers are refused
access to source code management to protect the project from damage. git
eliminates that barrier, and should let many new developers experiment
and share their results without affecting my work

> Can I hear experiences from other big projects that tried to use
> git [*1*]?  I suspect there are many that have tried, and I
> would not be surprised at all if git did not work out well for
> them.  For projects that already run on a (free) SCM, I would be
> very surprised if the developers find the current git 10 times
> better than the SCM they have been using (probably with an
> exception of CVS), unless they have very specific need, such as
> parallel development of distributed nature like the Linux
> kernel.

Everyone *wants* parallel distributed development, CVS prevents it.
And, remember that this is *not* a huge project, the core X server is
only 2M lines of source code. We separate out all of the drivers,
libraries and applications. Doing the migration in pieces allows us to
incrementally affect developers, and repair issues without suspending
all development.

I don't know of other huge projects moving to git; it's not all that
interesting as we know the tool is stable and will scale to support our
project already. Also, hg and bzr are not ready for production use in my
opinion; hg as it appears likely a flag day will be required before 1.0,
and bzr because they didn't focus on repository format, and have
suggested that they will switch to a hash-addressed scheme at some point
in the future...

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-28  5:53     ` Linus Torvalds
  2006-01-28  6:32       ` Junio C Hamano
@ 2006-01-29 10:12       ` Fredrik Kuivinen
  2006-01-29 20:15         ` Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Fredrik Kuivinen @ 2006-01-29 10:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, Junio C Hamano, Git Mailing List, keithp

On Sat, Jan 28, 2006 at 12:53:31AM -0500, Linus Torvalds wrote:
> 
> 
> On Sat, 28 Jan 2006, Martin Langhoff wrote:
> > 
> > BTW, have you still got that patch to git-merge to seed the commit msg
> > with conflicted files? ;-)
> 
> Nope. But it was something like the appended (totally untested, and 
> slightly improved).
> 
> The point being that we'd fill in a template that the committer will 
> hopefully edit to explain what he did to fix up the merge for each file 
> that had conflicts.
> 

Would it make sense to add an optional

   mergeresult <tree>

line to merge commit objects? Here <tree> is supposed to be a SHA1 of
the tree object which corresponds to the result of the automatic part
of a merge. Hence, for a given merge commit which had conflicts
"git-diff-tree <commit SHA1> <mergeresult SHA1>" would give a diff
which shows the changes that was applied to resolve the conflict.

When the recursive merge strategy is used we actually write the
'mergeresult' tree object to the object database, so this thing should
be straight forward to implement in that case. If there is interest it
could be implemented for the resolve strategy too.

I think those mergeresult lines might be useful when implementing
git-annotate across merges too. It makes it easy to distinguish
changes which came from the merged branches and changes introduced in
the merge itself.

It would not be backwards compatible with the current git though...

- Fredrik

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 10:09         ` Keith Packard
@ 2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:12             ` Greg KH
  2006-01-30 22:51             ` Alex Riesen
  0 siblings, 2 replies; 110+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-29 11:18 UTC (permalink / raw)
  To: Keith Packard
  Cc: Junio C Hamano, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

Keith Packard wrote:
> Fortunately, there are very few people involved with any specific piece
> of the X.org distribution; there's really only one or two people
> actively developing the X.org core server, so that part of the migration
> will be easy. Our users will be stuck, but there aren't many of them
> either, and git makes just sucking the current bits pretty easy. 
>  

Not under Windows (bleh), but it's support for Cygwin is getting better
and better.

> I don't know of other huge projects moving to git; it's not all that
> interesting as we know the tool is stable and will scale to support our
> project already. Also, hg and bzr are not ready for production use in my
> opinion; hg as it appears likely a flag day will be required before 1.0,

I haven't seen any such flag day since 0.3. Repository format seems
stable, except rename and modes support (these will be added in a
compatible way I think).
0.8 release is imminent (today or tomorrow).

I personally wouldn't mind git - it's great.

The only drawback is local cloning. This operation is like 4x slower
than plain copying of the repository. Probably because it works like an
ssh clone - creates a pack, copies it, then unpacks. This is just
inefficient on a local machine.

> and bzr because they didn't focus on repository format, and have
> suggested that they will switch to a hash-addressed scheme at some point
> in the future...
>   

Not only that - they don't have an efficient network transfer protocol.
(they use HTTP walkers, not even supporting persistent connections and
also do too many DNS lookups)
This is very unfortunate, especially for large projects.
(branching Linux would take 3 days I think)

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29  3:53           ` Junio C Hamano
@ 2006-01-29 14:19             ` Morten Welinder
  2006-01-29 20:15               ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Morten Welinder @ 2006-01-29 14:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> I think that 40% sounds about right.  My understanding of the
> underlying format CVS uses, RCS, is that it stores an full copy
> of the tip of trunk uncompressed, and other versions of the file
> are represented as incremental delta from that.  The packed git
> format does not favor particular version based on the distance
> from the tip, and stores either a compressed full copy, or a
> delta from some other revision (which may not necessarily be
> represented as a full copy).  When we store something as a delta
> from something else, we limit the length of the delta chain to a
> full copy to 10 (by default), so that you can get to a specific
> object with at most 10 applications of delta on top of a full
> copy.

If I understand this right, that means that for a log file (in this
case a ChangeLog file) that is appended to linearly as a
function of revision number, we have...

cvs: O(n) archive size
git: O(n*n) archive size

At least that is what we get if revision N is always deltad over
revision N-1.  A good deal could be saved if instead of dumping
a full copy every 10 revisions, that revision would instead be
deltad off an earlier revision, but I think it'll still be O(n*n).

(/me prepares for Linus chiming in and telling me I should not
keep ChangeLog files, :-)

M.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 11:18           ` Radoslaw Szkodzinski
@ 2006-01-29 18:12             ` Greg KH
  2006-01-31 18:33               ` Radoslaw Szkodzinski
  2006-01-30 22:51             ` Alex Riesen
  1 sibling, 1 reply; 110+ messages in thread
From: Greg KH @ 2006-01-29 18:12 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

On Sun, Jan 29, 2006 at 12:18:45PM +0100, Radoslaw Szkodzinski wrote:
> 
> The only drawback is local cloning. This operation is like 4x slower
> than plain copying of the repository. Probably because it works like an
> ssh clone - creates a pack, copies it, then unpacks. This is just
> inefficient on a local machine.

Have you tried the "-l" option for cloneing locally?  It's _very_ fast,
even for my tiny little old laptop.

If you add a "-n" that will not checkout the source tree, so you can
compare the time of cloning with the checkout portion.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
  2006-01-29  2:14         ` Morten Welinder
  2006-01-29 10:09         ` Keith Packard
@ 2006-01-29 18:37         ` Dave Jones
  2006-01-29 20:17           ` Daniel Barkalow
  2006-01-30 18:58         ` Carl Baldwin
  3 siblings, 1 reply; 110+ messages in thread
From: Dave Jones @ 2006-01-29 18:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Keith Packard, Martin Langhoff, Linus Torvalds, Git Mailing List

On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
 > Can I hear experiences from other big projects that tried to use
 > git [*1*]?  I suspect there are many that have tried, and I
 > would not be surprised at all if git did not work out well for
 > them.  For projects that already run on a (free) SCM, I would be
 > very surprised if the developers find the current git 10 times
 > better than the SCM they have been using (probably with an
 > exception of CVS), unless they have very specific need, such as
 > parallel development of distributed nature like the Linux
 > kernel.

I've found switching from cvs->git even for small projects has
made me more productive.  In part because it's got me away from
the 'check in to a centralised server like sourceforge' mentality,
without the need to set up a local cvs server of my own.
Adding changesets to a small project like x86info, now takes
seconds, whereas it used to take minutes of thumb-twiddling whilst
I waited for sf.net to do its thing.   The ability to check in
changesets locally whilst I'm travelling, and then push them when
I have network connectivity again is also a massive productivity
win over cvs.

There's also another git usage that I doubt I'm alone in doing.
I regularly use git to import cvs trees from sourceforge etc for
random projects, because I now find browsing history of projects
with tools like gitk much nicer than any cvs tool I've used.
(cvs annotate is the only thing I really miss).

What would be really cool, would be a web page pointing to public
conversions of various projects cvs trees, so that everyone doesn't
have to keep hammering various repos to do the conversions themselves.
(Sort of a pseudo bkbits.net).

		Dave

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 14:19             ` Morten Welinder
@ 2006-01-29 20:15               ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-29 20:15 UTC (permalink / raw)
  To: Morten Welinder; +Cc: git

Morten Welinder <mwelinder@gmail.com> writes:

> If I understand this right, that means that for a log file (in this
> case a ChangeLog file) that is appended to linearly as a
> function of revision number, we have...
>
> cvs: O(n) archive size
> git: O(n*n) archive size
>
> At least that is what we get if revision N is always deltad over
> revision N-1.  A good deal could be saved if instead of dumping
> a full copy every 10 revisions, that revision would instead be
> deltad off an earlier revision, but I think it'll still be O(n*n).

I have not counted O()rders, but it is not as simple as that,
because we do not really compare "versions".  If version N
reverts a change version N-1 introduced since version N-2, we
would not even store a copy for version N and version N-2
separately.  We just store a single copy, which may be delta
information against version N-1 (or the other way around and N-1
might be delta against N).

For the sake of math, let's say this project keeps only one
file, append only ChangeLog, with a straight line of development
without branches ("single strand of pearls"), and has revisions
1..N.

In RCS, you would have a full copy of the revision N, and
revision J is recorded as delta from revision J+1 for 1 <= J < N.
This delta is similar to "ed" script, and going backwards in the
history for the ChangeLog example means only line deletion is
involved, so what was removed is not recorded.  It records how
many lines are removed from where.  This is _very_ efficient and
compact.

In git, we would have a full copy of version N (because we favor
keeping larger blob associated with newer commits as a full
copy), and essentially the same thing as RCS happens.  The only
difference is that our "delta" is binary delta, but in this
case, it just records "copy N bytes from here to here" which
results in about the same amount of information to represent
each delta.  As you say, if (10 < N), we would have a full copy
every once in a while.  You could use depth other than the
default to make this chaining longer and if you did so, your
repository would be *very* compactly compressed.

However, retrieving cost of version 1 is quite different.  RCS
format is O(n) -- you start from the tip, extract and interpret
(N-1) deltas and apply them in turn to get to what you want.

The cost of extracting an arbitrary version is bounded in git
packfile, because you need to do such an "extract, interpret and
apply" at most $depth cycles.  This is primarily because we do
not store "versions" but individual objects, and do not apply
"newer revisions are far more likely to be accessed often"
heuristics, which RCS format is designed for.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors)
  2006-01-29 10:12       ` Fredrik Kuivinen
@ 2006-01-29 20:15         ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-29 20:15 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Linus Torvalds, Martin Langhoff, Git Mailing List, keithp

Fredrik Kuivinen <freku045@student.liu.se> writes:

> Would it make sense to add an optional
>
>    mergeresult <tree>
>
> line to merge commit objects?

Two issues and a half.

(1) Not all conflicting merge cases can write a sensible
    "conflicted intermediate auto-merge result".  Look for cases
    where we punt in git-merge-one-file.

(2) Modulo issue (1), it can be re-computed if and when needed,
    so this is akin to "storing rename information in the commit
    by detecting renames while merging".

(3) Depending on the direction you pull, you would have
    logically the same "conflicted auto-merge result" that has
    <<< === >>> delimited hunks in reverse.  Which one should
    you record?

And annotate would not be helped much -- if it is needed you
could recompute it at that point.  Annotate needs to look at the
diff from each parent _anyway_ to assign blames.

By the way, I brought up the issue (3) because it relates to how
my latest toy "git rerere" works ;-).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 18:37         ` Dave Jones
@ 2006-01-29 20:17           ` Daniel Barkalow
  2006-01-29 20:29             ` Martin Langhoff
  2006-01-30 15:23             ` Mike McCormack
  0 siblings, 2 replies; 110+ messages in thread
From: Daniel Barkalow @ 2006-01-29 20:17 UTC (permalink / raw)
  To: Dave Jones
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

On Sun, 29 Jan 2006, Dave Jones wrote:

> On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
>  > Can I hear experiences from other big projects that tried to use
>  > git [*1*]?  I suspect there are many that have tried, and I
>  > would not be surprised at all if git did not work out well for
>  > them.  For projects that already run on a (free) SCM, I would be
>  > very surprised if the developers find the current git 10 times
>  > better than the SCM they have been using (probably with an
>  > exception of CVS), unless they have very specific need, such as
>  > parallel development of distributed nature like the Linux
>  > kernel.
> 
> I've found switching from cvs->git even for small projects has
> made me more productive.  In part because it's got me away from
> the 'check in to a centralised server like sourceforge' mentality,
> without the need to set up a local cvs server of my own.
> Adding changesets to a small project like x86info, now takes
> seconds, whereas it used to take minutes of thumb-twiddling whilst
> I waited for sf.net to do its thing.   The ability to check in
> changesets locally whilst I'm travelling, and then push them when
> I have network connectivity again is also a massive productivity
> win over cvs.
> 
> There's also another git usage that I doubt I'm alone in doing.
> I regularly use git to import cvs trees from sourceforge etc for
> random projects, because I now find browsing history of projects
> with tools like gitk much nicer than any cvs tool I've used.
> (cvs annotate is the only thing I really miss).

I think this is the real driving factor for git adoption: it doesn't have 
to be 10x better for people to use it, because individuals can use it for 
interacting with CVS projects without causing anybody else any pain. It 
doesn't just enable distributed development, it enables a distributed 
choice of SCM, which means a much lower activation energy threshold. I 
think we'll see a lot more adoption when we have a CVS daemon interface 
(so projects can stop having a CVS repository, and support both sorts of 
users with a git repository and have better metadata), and also if someone 
sets up a place for putting git imports of CVS projects, so people will 
know that other people are using git.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 20:17           ` Daniel Barkalow
@ 2006-01-29 20:29             ` Martin Langhoff
  2006-01-30 15:23             ` Mike McCormack
  1 sibling, 0 replies; 110+ messages in thread
From: Martin Langhoff @ 2006-01-29 20:29 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Dave Jones, Junio C Hamano, Keith Packard, Linus Torvalds,
	Git Mailing List

On 1/30/06, Daniel Barkalow <barkalow@iabervon.org> wrote:
> > There's also another git usage that I doubt I'm alone in doing.
> > I regularly use git to import cvs trees from sourceforge etc for
> > random projects, because I now find browsing history of projects
> > with tools like gitk much nicer than any cvs tool I've used.
> > (cvs annotate is the only thing I really miss).
>
> I think this is the real driving factor for git adoption: it doesn't have
> to be 10x better for people to use it, because individuals can use it for
> interacting with CVS projects without causing anybody else any pain.

IMHO, this is a killer feature of GIT. From a CVS/SVN user point of
view, it has vendor branches done right. At work, we do that with
Moodle, Elgg, EPrints and GForge. And the list is growing. That's why
I'm working on the toolchain to make interop with CVS smooth so I can
land patches in  upstream projects where I have cvs access.

cheers,


m

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 20:17           ` Daniel Barkalow
  2006-01-29 20:29             ` Martin Langhoff
@ 2006-01-30 15:23             ` Mike McCormack
  1 sibling, 0 replies; 110+ messages in thread
From: Mike McCormack @ 2006-01-30 15:23 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Dave Jones, Junio C Hamano, Keith Packard, Martin Langhoff,
	Linus Torvalds, Git Mailing List

Daniel Barkalow wrote:
> I think we'll see a lot more adoption when we have a CVS daemon interface 
> (so projects can stop having a CVS repository, and support both sorts of 
> users with a git repository and have better metadata), and also if someone 
> sets up a place for putting git imports of CVS projects, so people will 
> know that other people are using git.

The Wine project is using a GIT repository which is mirrored into CVS. 
Alexandre wrote scripts to mirror GIT commits into CVS, so developers 
can use whichever they're more comfortable with, and the CVS repository 
remains up to date.

We've found that patch submitters using GIT tend to send multiple 
patches per day, and that those using CVS tend to send a patch or two 
occasionally or just keep up to date with the source.

Mike

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
                           ` (2 preceding siblings ...)
  2006-01-29 18:37         ` Dave Jones
@ 2006-01-30 18:58         ` Carl Baldwin
  2006-01-31 10:27           ` Johannes Schindelin
  2006-02-01 19:32           ` H. Peter Anvin
  3 siblings, 2 replies; 110+ messages in thread
From: Carl Baldwin @ 2006-01-30 18:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Keith Packard, Martin Langhoff, Linus Torvalds, Git Mailing List

Junio,

You don't seem to give git enough credit.  I am a hardware engineer with
many softwareish responsibilities.  One of those is to keep up to date
with the many commercial and free SCM type tools that are available.

Git has become my SCM tool of choice for many reasons.

- Anyone can install and fire it up without license/contract hassles.

- The infrastructure barriers to getting a project started with git are
  about as low as they can be.

- Geographically distributed teams even inside a corporation are
  becoming more common.  Git's repository design meets this need
  perfectly.

- The repository is also to designed to be inherently safe from
  data-loss and corruption even in the face of concurrent writes due to
  each objects' immutable nature.

- While on the subject of the repository.  Good job keeping it simple.
  I was able to learn pretty much all there is to know from a technical
  stand-point about the objects and refs directories in an afternoon.
  It follows a principle I always work toward myself.  "Make it simple
  enough that there are obviously no difficiencies rather than making it
  complicated so that there are no obvious difficiencies."

- In my opinion git is flexible enough to support just about any
  development/build/release flow that one can think of.  Most of the
  free tools (including subversion and arch) make branching and merging
  --- on which most of these flows rely --- way too heavy-weight.  Git
  shows how light-weight it can be.

  Not only can parallel development happen easily between
  users/repositories but parallel development is trivial even within the
  same repository.  I  think your 'pu' system illustrates how powerful
  it can be.  I myself have had up to four concurrent branches where I
  implemented four different features in parallel in the same repository
  easily switching between them.  It was almost too easy to bring them
  together using merge as each one finished.

  I was just reading through an article on how to choose an SCM last
  week and I kept thinking how git could be used to meet almost every
  one (if not all) of the needs discussed.

- Git supports enough network protocols to make it immediately useful in
  about any situation with firewalls and such.  This is where it leaves
  monotone behind.

The biggest hurdle that I've seen in adopting git is training the users.
I myself took to it like a duck to water but I've found that even some
of my brightest colleages have trouble wrapping their heads around it.
Currently, I'm trying to look at what parts they are having the most
trouble with.  In general, I think it is grasping the reason for the
index file and how git commands like git-commit and git-diff interact
with it.

Even so, I've always appreciated those tools that may have a steeper
learning curve but that pay dividends over time.  Also, I should mention
that this learning curve has been flattening over time as git has
developed and obtained more porcelainish commands.

Carl

On Sat, Jan 28, 2006 at 01:08:54PM -0800, Junio C Hamano wrote:
> Keith Packard <keithp@keithp.com> writes:
> 
> Wow.......  You are switching Cairo and X.org from CVS to git?
> 
> It could be that anything is better than CVS these days, but I
> have to admit that my jaw dropped after reading this, primarily
> because I've have never touched anything as big as X.
> 
> Awestruck, dumbstruck,... Xstruck.  Yeah, I know I should have
> more faith in git.  Earlier I heard Wine folks are running git
> in parallel with CVS as their dual primary SCM now, and of
> course git is the primary SCM for the Linux kernel project.

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 11:18           ` Radoslaw Szkodzinski
  2006-01-29 18:12             ` Greg KH
@ 2006-01-30 22:51             ` Alex Riesen
  2006-01-31 21:25               ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Alex Riesen @ 2006-01-30 22:51 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

Radoslaw Szkodzinski, Sun, Jan 29, 2006 12:18:45 +0100:
> > Fortunately, there are very few people involved with any specific piece
> > of the X.org distribution; there's really only one or two people
> > actively developing the X.org core server, so that part of the migration
> > will be easy. Our users will be stuck, but there aren't many of them
> > either, and git makes just sucking the current bits pretty easy. 
> 
> Not under Windows (bleh), but it's support for Cygwin is getting better
> and better.
> 

I use git in cygwin for a project with more then 17k files (almost 6M lines).
It's real slow on ntfs (on 3.2Mhz PIV!), PITA on fat, and has some hiccups now
and then (of the kind: "windows unexpectedly does not have feature X, which
everything else has" or "windows broke a 20-year-old feature Y").

But its more intuitive and more powerful than any alternatives here (Perforce,
SVN and CVS come to mind).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 18:58         ` Carl Baldwin
@ 2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
                               ` (2 more replies)
  2006-02-01 19:32           ` H. Peter Anvin
  1 sibling, 3 replies; 110+ messages in thread
From: Johannes Schindelin @ 2006-01-31 10:27 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Hi,

On Mon, 30 Jan 2006, Carl Baldwin wrote:

> In general, I think it is grasping the reason for the index file and how 
> git commands like git-commit and git-diff interact with it.

IMHO this is the one big showstopper. I had problems explaining the 
concept myself.

For example, I had a hard time explaining to a friend why a git-add'ed 
file is committed when saying "git commit some_other_file", but not 
another (modified) file. Very unintuitive.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
@ 2006-01-31 15:24             ` Carl Baldwin
  2006-01-31 15:31               ` Johannes Schindelin
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 23:16             ` Daniel Barkalow
  2 siblings, 1 reply; 110+ messages in thread
From: Carl Baldwin @ 2006-01-31 15:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Its difficult to explain because it breaks away from the precedent set
by other SCMs.  I wouldn't call it a show-stopper for this reason.  In
fact, some who have wrapped their heads around the concept might call it
a valuable feature.  I, myself, have found it a handy thing in certain
circumstances.  In other circumstances I simply bypass it by adding -a
to the command-line.

This doesn't fit my definition of a show-stopper.

Carl

On Tue, Jan 31, 2006 at 11:27:34AM +0100, Johannes Schindelin wrote:
> Hi,
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.
> 
> Ciao,
> Dscho
> 
> 

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 15:24             ` Carl Baldwin
@ 2006-01-31 15:31               ` Johannes Schindelin
  0 siblings, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2006-01-31 15:31 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Hi,

On Tue, 31 Jan 2006, Carl Baldwin wrote:

> Its difficult to explain because it breaks away from the precedent set
> by other SCMs.  I wouldn't call it a show-stopper for this reason.

I don't.

The strange concept from the user's perspective is that

	git commit -m "some message" file-a.txt

can commit file-b.txt also.

> [...] In other circumstances I simply bypass it by adding -a to the 
> command-line.

This is a different thing.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
@ 2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
                                 ` (2 more replies)
  2006-01-31 23:16             ` Daniel Barkalow
  2 siblings, 3 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-01-31 17:30 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Junio C Hamano, Keith Packard, Martin Langhoff,
	Git Mailing List



On Tue, 31 Jan 2006, Johannes Schindelin wrote:
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.

I really think you should explain it one of two ways:

 - ignore it. Never _ever_ use git-update-index directly, and don't tell 
   people about use individual filenames to git-commit. Maybe even add 
   "-a" by default to the git-commit flags as a special installation 
   addition.

 - talk about the index, and revel in it as a way to explain the staging 
   area. This is what the old tutorial.txt did before it got simplified.

The "ignore the index" approach is the simple one to explain. It's 
strictly less powerful, but hey, what else is new? 

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
@ 2006-01-31 18:12               ` J. Bruce Fields
  2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:01               ` Keith Packard
  2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 1 reply; 110+ messages in thread
From: J. Bruce Fields @ 2006-01-31 18:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 09:30:48AM -0800, Linus Torvalds wrote:
> I really think you should explain it one of two ways:
> 
>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.
> 
>  - talk about the index, and revel in it as a way to explain the staging 
>    area. This is what the old tutorial.txt did before it got simplified.
> 
> The "ignore the index" approach is the simple one to explain. It's 
> strictly less powerful, but hey, what else is new? 

Yeah, I do wonder what's likely to be the best approach for most users.
My goal with the new tutorial was to get a reader doing something fun
and useful as quickly as possible.  So it just refers elsewhere for any
discussion of the index file or SHA1 names.  But probably everyone needs
to pick up that stuff eventually anyway, and maybe it's better to get to
it a little sooner, I dunno.

Besides the git-add/git-commit thing, the other thing that caught me by
suprise was the behaviour of git reset.  I expected there to be an
"inverse" to git commit -a, meaning that

	1) the sequence
		git reset HEAD^
		git commit -a
	   would be a no-op, in the sense that the new commit would
	   get the same changes as the old one, and
	2) the sequence
		git commit -a
		git reset HEAD^
	   would be a no-op, in the sense that "git diff" would report
	   the same diff before and after.

But there isn't, and explaining how --soft and --mixed actually work
requires referring to the index file.

Is that something that can be fixed in the tools or does the user
fundamentally need to know about the index file to do this kind of
stuff?

--b.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-29 18:12             ` Greg KH
@ 2006-01-31 18:33               ` Radoslaw Szkodzinski
  2006-01-31 19:50                 ` Radoslaw Szkodzinski
  0 siblings, 1 reply; 110+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 18:33 UTC (permalink / raw)
  To: Greg KH
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

Greg KH wrote:
> On Sun, Jan 29, 2006 at 12:18:45PM +0100, Radoslaw Szkodzinski wrote:
>> The only drawback is local cloning. This operation is like 4x slower
>> than plain copying of the repository. Probably because it works like an
>> ssh clone - creates a pack, copies it, then unpacks. This is just
>> inefficient on a local machine.
> 
> Have you tried the "-l" option for cloneing locally?  It's _very_ fast,
> even for my tiny little old laptop.

Because it's cp -rl <one-tree> <second-tree> and some file modifications, right?
It's what I've been using already.

This -l option should be more prominent in the documentation.
Maybe it even already is. I've taught myself using git before 0.9.

Thank you. This helps a lot.

> If you add a "-n" that will not checkout the source tree, so you can
> compare the time of cloning with the checkout portion.

Cloning without -l option is much slower - some minutes vs below a minute.
I could have time(8)d it, but it's no use.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
@ 2006-01-31 19:01               ` Keith Packard
  2006-01-31 19:21                 ` Linus Torvalds
  2006-01-31 20:56                 ` Sam Ravnborg
  2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 2 replies; 110+ messages in thread
From: Keith Packard @ 2006-01-31 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: keithp, Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Tue, 2006-01-31 at 09:30 -0800, Linus Torvalds wrote:

>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.

As a newly initiated user, this would have been a more gentle
introduction to the system. But, it would be hard to make it entirely
invisible given the current interfaces. I'm not sure if obscuring the
presense of the index is a great plan; it's already hard enough to
figure out how it works.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:01               ` Keith Packard
@ 2006-01-31 19:21                 ` Linus Torvalds
  2006-01-31 22:55                   ` Joel Becker
  2006-01-31 20:56                 ` Sam Ravnborg
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-01-31 19:21 UTC (permalink / raw)
  To: Keith Packard
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Keith Packard wrote:

> On Tue, 2006-01-31 at 09:30 -0800, Linus Torvalds wrote:
> 
> >  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
> >    people about use individual filenames to git-commit. Maybe even add 
> >    "-a" by default to the git-commit flags as a special installation 
> >    addition.
> 
> As a newly initiated user, this would have been a more gentle
> introduction to the system. But, it would be hard to make it entirely
> invisible given the current interfaces. I'm not sure if obscuring the
> presense of the index is a great plan; it's already hard enough to
> figure out how it works.

Now, I do agree. I don't actually like hiding the index too much. 
Understanding the index is _invaluable_ whenever you're doing a merge with 
conflicts, and understanding what tools are available to you to resolve 
those conflicts.

The index is also obviously very important when you do a partial commit, 
and it's something I do end up doing quite often. Again, maybe that's not 
something that a new git user should be encouraged to ever do, but it's a 
huge convenience feature for power-users.

Understanding the index also allows people to understand certain 
performance-characteristics of git, and explains how "git add" (and 
remove, if we had one) actually works independently of the commit. 

So I'm actually of the "revel in the index" camp (as could probably be 
guessed by the original tutorial).

My personal suggestion would be to introduce git "gently" by ignoring it, 
but by the time a person actually _works_ on a project (as opposed to just 
going through a tutorial or following another persons project), he/she 
should probably have been introduced to the index in order to understand 
what happens and to use its power.

(In particular, the difference between "git diff" and "git diff HEAD" is 
an important one to understand eventually).

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 18:12               ` J. Bruce Fields
@ 2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:44                   ` Jon Loeliger
  2006-01-31 20:06                   ` J. Bruce Fields
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-31 19:33 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Linus Torvalds, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

"J. Bruce Fields" <bfields@fieldses.org> writes:

> On Tue, Jan 31, 2006 at 09:30:48AM -0800, Linus Torvalds wrote:
>>
>> The "ignore the index" approach is the simple one to explain. It's 
>> strictly less powerful, but hey, what else is new? 
>
> Yeah, I do wonder what's likely to be the best approach for most users.
> My goal with the new tutorial was to get a reader doing something fun
> and useful as quickly as possible.  So it just refers elsewhere for any
> discussion of the index file or SHA1 names.  But probably everyone needs
> to pick up that stuff eventually anyway, and maybe it's better to get to
> it a little sooner, I dunno.

I think many good stuff git offers would not be helpful to the
users until index is understood as the third entity, in addition
to the usual "committed state" and "working tree state".  It
might be better to talk about it sooner rather than later.  And
the tool is geared towards taking advantage of it, so until the
user understands that, behaviour of some tools would feel
unintuitive.

You can have local throw-away modifications while applying
patches and merging (I once broke merges by ignoring that it is
perfectly valid to have index and working tree files be
different and keep working that way.  That was a hard lesson).
The index file knows what working tree changes are meant to be
committed.  Another thing I find useful, which cannot be done
without index, is to sanity check while developing.  When "git
diff" gives too many diffs, running update-index on paths that I
think are more-or-less OK helps to reduce clutter, and I can
view only further changes to those paths.

In a sense, update-index can be thought of to check in the
changes without committing.  You can check in number of times,
and the cumulative effect is committed later.  "reset --mixed"
is undoing these uncommitted check-ins.  "reset --hard" undoes
the last commit.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:33                 ` Junio C Hamano
@ 2006-01-31 19:44                   ` Jon Loeliger
  2006-01-31 19:52                     ` Junio C Hamano
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
  2006-01-31 20:06                   ` J. Bruce Fields
  1 sibling, 2 replies; 110+ messages in thread
From: Jon Loeliger @ 2006-01-31 19:44 UTC (permalink / raw)
  To: Git List

On Tue, 2006-01-31 at 13:33, Junio C Hamano wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:

> I think many good stuff git offers would not be helpful to the
> users until index is understood as the third entity, in addition
> to the usual "committed state" and "working tree state".  It
> might be better to talk about it sooner rather than later.  And
> the tool is geared towards taking advantage of it, so until the
> user understands that, behaviour of some tools would feel
> unintuitive.

Agreed.

> You can have local throw-away modifications while applying
> patches and merging (I once broke merges by ignoring that it is
> perfectly valid to have index and working tree files be
> different and keep working that way.  That was a hard lesson).
> The index file knows what working tree changes are meant to be
> committed.  Another thing I find useful, which cannot be done
> without index, is to sanity check while developing.  When "git
> diff" gives too many diffs, running update-index on paths that I
> think are more-or-less OK helps to reduce clutter, and I can
> view only further changes to those paths.

And right there is where people get caught by surprise.
What "they" then want to do is actually pick certain
files to commit.  And when they do, they get caught off
guard by the _additional_ files.

I have done this style of "update-index on more-or-less OK
files in order to clear up the diff.  And it is also in that
time frame that I start feeling that certain changes belong
to "one commit" or another.  The result is, I want to then
pick the parts that get committed together.  But _really_
being certain exactly which files, and _only_ those files,
will really be committed is tough.

jdl

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 18:33               ` Radoslaw Szkodzinski
@ 2006-01-31 19:50                 ` Radoslaw Szkodzinski
  2006-01-31 20:43                   ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 19:50 UTC (permalink / raw)
  To: Greg KH
  Cc: Keith Packard, Junio C Hamano, cworth, Martin Langhoff,
	Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

Radoslaw Szkodzinski wrote:
> Cloning without -l option is much slower - some minutes vs below a minute.
> I could have time(8)d it, but it's no use.
> 

Make that time(1)d.

Results for the kernel follow. Disc cache has been preheated with find.

git version: 5b2bcc7b2d546c636f79490655b3347acc91d17f
Filesystem: ext3 data=writeback
Kernel: 2.6.16-rc1-astorm2 (mostly -ck patchset with "hotfix")
Elevator: CFQ

time git clone linux-2.6.git linux-2.6.git.new
Packing 180025 objects

real    8m31.637s
user    3m19.571s
sys     0m42.211s

Extremely bad. The task is mostly cpu-bound.
Made some background applications swap out late in the process.
(that's the cause of the sys time)

time git clone -l linux-2.6.git linux-2.6.git.local
0 blocks

real    0m42.339s
user    0m2.818s
sys     0m4.040s

Good enough for me. Possibly cp -rl of objects and then a checkout.

time cp -rl linux-2.6.git linux-2.6.git.rl

real    0m18.333s
user    0m0.103s
sys     0m1.732s

Really fast, but requires additional file modification.
(namely .git/remotes/origin, removal of gitrc)
Also incompatible with apps having problems with hardlinks.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:44                   ` Jon Loeliger
@ 2006-01-31 19:52                     ` Junio C Hamano
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-31 19:52 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: git

Jon Loeliger <jdl@freescale.com> writes:

> I have done this style of "update-index on more-or-less OK
> files in order to clear up the diff.  And it is also in that
> time frame that I start feeling that certain changes belong
> to "one commit" or another.  The result is, I want to then
> pick the parts that get committed together.  But _really_
> being certain exactly which files, and _only_ those files,
> will really be committed is tough.

	$ git diff --cached

would help.  If you are _only_ comitting either all changes or no
change per path, 'git diff --cached --name-status' would be
sufficient.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:33                 ` Junio C Hamano
  2006-01-31 19:44                   ` Jon Loeliger
@ 2006-01-31 20:06                   ` J. Bruce Fields
  1 sibling, 0 replies; 110+ messages in thread
From: J. Bruce Fields @ 2006-01-31 20:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 11:33:21AM -0800, Junio C Hamano wrote:
> I think many good stuff git offers would not be helpful to the
> users until index is understood as the third entity, in addition
> to the usual "committed state" and "working tree state".  It
> might be better to talk about it sooner rather than later.  And
> the tool is geared towards taking advantage of it, so until the
> user understands that, behaviour of some tools would feel
> unintuitive.

Yeah, makes sense.  But I'd like to introduce that while still
introducing the higher-level tools earlier on than core-tutorial.txt
does.  I'll give some thought to how to move things in that direction,
maybe this weekend....

--b.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:50                 ` Radoslaw Szkodzinski
@ 2006-01-31 20:43                   ` Junio C Hamano
  2006-01-31 21:02                     ` Radoslaw Szkodzinski
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-01-31 20:43 UTC (permalink / raw)
  To: Radoslaw Szkodzinski
  Cc: Greg KH, Keith Packard, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Radoslaw Szkodzinski <astralstorm@gorzow.mm.pl> writes:

> Radoslaw Szkodzinski wrote:
>> Cloning without -l option is much slower - some minutes vs below a minute.
>> I could have time(8)d it, but it's no use.
>> 
>
> Make that time(1)d.
>
> Results for the kernel follow. Disc cache has been preheated with find.

While you are at it, "git clone -l -s -n" might be more interesting.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
       [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
@ 2006-01-31 20:56                       ` J. Bruce Fields
  0 siblings, 0 replies; 110+ messages in thread
From: J. Bruce Fields @ 2006-01-31 20:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Loeliger, git

On Tue, Jan 31, 2006 at 12:41:59PM -0800, Junio C Hamano wrote:
> On the tutorial front, maybe we could start teaching people to
> always use "commit -a", and not tell them about update-index nor
> "commit paths.." at all.  Have them do "hello world", review
> changes since the last commit with "git diff", and make commit
> with "git commit -a".  Next tell them about index, and after
> they understand index, finally tell them "commit paths..."  is
> there merely to reduce typing.

Yeah, I think that's approximately what you get right now if you read
tutorial.txt followed by core-tutorial.txt, though the two currently may
not really work together well as sequels.

So I'm inclined to start by revising the two to make them read well as
sequels, then maybe moving some of core-tutorial.txt into the earlier
tutorial.txt.  By the time we're done the two might end up being one
document.  Or they might still be two, but with the split being more
clearly beginning/advanced instead of high-level/low-level.

Feedback from people who'd actually worked through the two would
obviously be useful.

--b.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:01               ` Keith Packard
  2006-01-31 19:21                 ` Linus Torvalds
@ 2006-01-31 20:56                 ` Sam Ravnborg
  2006-01-31 22:21                   ` Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Sam Ravnborg @ 2006-01-31 20:56 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, keithp, Johannes Schindelin, Carl Baldwin,
	Junio C Hamano, Martin Langhoff, Git Mailing List

> As a newly initiated user, this would have been a more gentle
> introduction to the system. But, it would be hard to make it entirely
> invisible given the current interfaces. I'm not sure if obscuring the
> presense of the index is a great plan; it's already hard enough to
> figure out how it works.

I have found myself using a mixture of cogito and git commands lately.
Part of it being that my finger type something like:
rm `git ls-files -m`
cg-restore

and I have not convinced them about git reset --hard


But the primary thing is cg-commit
I give you a list of files modified which can be edited and
it have saved me a couple of times commiting to much.
And I get vi fired up so no need to fiddle with command line argumetns.

   Sam

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 20:43                   ` Junio C Hamano
@ 2006-01-31 21:02                     ` Radoslaw Szkodzinski
  0 siblings, 0 replies; 110+ messages in thread
From: Radoslaw Szkodzinski @ 2006-01-31 21:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Greg KH, Keith Packard, cworth, Martin Langhoff, Linus Torvalds,
	Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

Junio C Hamano wrote:
> Radoslaw Szkodzinski <astralstorm@gorzow.mm.pl> writes:
> 
>> Radoslaw Szkodzinski wrote:
>>> Cloning without -l option is much slower - some minutes vs below a minute.
>>> I could have time(8)d it, but it's no use.
>>>
>> Make that time(1)d.
>>
>> Results for the kernel follow. Disc cache has been preheated with find.
> 
> While you are at it, "git clone -l -s -n" might be more interesting.
> 
> 

Sure:

time git clone -l -s -n linux-2.6.git linux-2.6.git.lsn

real    0m0.458s
user    0m0.020s
sys     0m0.027s

Speed demon. I'd use it, but I often need a checkout anyway, so...

time git clone -l -s linux-2.6.git linux-2.6.git.ls

real    0m35.752s
user    0m2.661s
sys     0m2.374s

Not really better than git clone -l and relies on the tools more.
However, it should make for easier repacking and pruning. I'll keep it.

-- 
GPG Key id:  0xD1F10BA2
Fingerprint: 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2

AstralStorm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 22:51             ` Alex Riesen
@ 2006-01-31 21:25               ` Linus Torvalds
  2006-01-31 21:52                 ` J. Bruce Fields
  2006-01-31 22:01                 ` Alex Riesen
  0 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-01-31 21:25 UTC (permalink / raw)
  To: Alex Riesen
  Cc: Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Martin Langhoff, Git Mailing List

On Mon, 30 Jan 2006, Alex Riesen wrote:
> 
> I use git in cygwin for a project with more then 17k files (almost 6M lines).
> It's real slow on ntfs (on 3.2Mhz PIV!)

One thing that git does rely on is a fast "lstat()" system call. The index 
file means that we almost never need to read the contents of a file to 
compare, but git _does_ check that files haven't been modified, and doing 
an "lstat()" on every single file it knows about is the way to do that.

Now, I suspect that you simply can't do basic filename lookups much faster 
than Linux does them. The Linux VFS layer name caching reigns supreme: the 
dentries are just incredibly powerful, and the reason Linux kicks ass on 
many benchmarks.

And yes, git was designed for it. git is _really_ fast on Linux, but any 
operating system that is so stupid that it has to call down to the 
low-level filesystem for filename lookup (which is most of them, and from 
what I have heard, the NT VFS layer is worse than most) will take a lot 
longer.

This is sadly not something I think you can possibly avoid. Git is 
literally being as fast as is humanly possible without doing explicit 
locking. You _can_ avoid the "lstat()" calls if you are willing to always 
explicitly mark files that you have changed (so that the SCM can stat just 
_those_ files and ignore all the others), but I personally much prefer 
being able to use any random tools on the files without having to prepare 
them some way.

So we could speed it up on cygwin (and yes, it would speed git up a lot 
even on Linux, but since the cached lstat() case is so fast anyway, I 
doubt a lot of Linux users care - the biggest win would be on a cold-cache 
tree).  But it would require that you explicitly _mark_ the files you edit 
some way.

Btw, BK wanted that, and it wasn't _too_ painful. You had to do

	bk edit

to mark a file as being ready to be dirtied, and as a helper command you 
would use

	bk editor

which would first do the "bk edit" thing and then start up your favourite 
editor (the usual ${EDITOR:${VISUAL:vi}} rules applied) on it, and it 
worked fine. We _could_ do the same in git.

I'd just prefer not to.

For small projects (or big projects with fairly few files), it really 
shouldn't matter. Your 17k files example is hopefully fairly rare..

> But its more intuitive and more powerful than any alternatives here (Perforce,
> SVN and CVS come to mind).

Good to know.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 21:25               ` Linus Torvalds
@ 2006-01-31 21:52                 ` J. Bruce Fields
  2006-01-31 22:01                 ` Alex Riesen
  1 sibling, 0 replies; 110+ messages in thread
From: J. Bruce Fields @ 2006-01-31 21:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 01:25:08PM -0800, Linus Torvalds wrote:
> So we could speed it up on cygwin (and yes, it would speed git up a lot 
> even on Linux, but since the cached lstat() case is so fast anyway, I 
> doubt a lot of Linux users care - the biggest win would be on a cold-cache 
> tree).  But it would require that you explicitly _mark_ the files you edit 
> some way.

You couldn't depend on a combination of lstat's and some kind of
filesystem change notifications?

--b.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 21:25               ` Linus Torvalds
  2006-01-31 21:52                 ` J. Bruce Fields
@ 2006-01-31 22:01                 ` Alex Riesen
       [not found]                   ` <20060201013901.GA16832@mail.com>
  1 sibling, 1 reply; 110+ messages in thread
From: Alex Riesen @ 2006-01-31 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Martin Langhoff, Git Mailing List

Linus Torvalds, Tue, Jan 31, 2006 22:25:08 +0100:
> > I use git in cygwin for a project with more then 17k files (almost
> > 6M lines).  It's real slow on ntfs (on 3.2Mhz PIV!)
> ...
> So we could speed it up on cygwin (and yes, it would speed git up a lot 
> even on Linux, but since the cached lstat() case is so fast anyway, I 
> doubt a lot of Linux users care - the biggest win would be on a cold-cache 
> tree).  But it would require that you explicitly _mark_ the files you edit 
> some way.

I'd hate to have to do that. The project in question is just stuffed
up beyond all reason, windows' VFS is a sorry piece of junk, and I
care much more about how comfortable the tool is.

> ...
> For small projects (or big projects with fairly few files), it really 
> shouldn't matter. Your 17k files example is hopefully fairly rare..

I'd say it is fairly common. It's what driven by paranoia and
suffering from chronic undereducation projects in big companies
usually end up with. Frequently right from the start...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 20:56                 ` Sam Ravnborg
@ 2006-01-31 22:21                   ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-01-31 22:21 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: git

"Sam Ravnborg" <sam@ravnborg.org> writes:

> But the primary thing is cg-commit
> I give you a list of files modified which can be edited and
> it have saved me a couple of times commiting to much.
> And I get vi fired up so no need to fiddle with command line argumetns.

[this is what I sent in a separate message but I goofed up the
destination headers and the message did not appear on the list,
so I am reprinting.]

I have always felt "git commit paths..." was a mistake; it
encourages partial commits by individual developers.

By "partial commit", I mean a commit that does not exactly match
the state of the working tree when the commit is made.  There
are two kinds of "partial commits".  Good ones and bad ones.

Being able to make partial commits is handy for people whose
primary role is to integrate many changes from trusted
developers rather than testing each and every commit as a whole
(read: Linus and subsystem maintainers).  Integrators' job may
include testing what have been merged as a whole by a compile
and reboot cycle as the final "wrap-up" step, but the most
important role they play is to sanity check the changes from
architectural perspective.

For that workflow to work effectively, however, the changes fed
by individual developers to the integrators have to be clean and
well tested.  A partial commit records something that never
existed in any working tree as a whole, so by definition it is
an untested change.  You would risk "sorry I forgot to commit
the changes to these paths but without them it does not even
compile", and end up wasting integrators' time.

The integrators make commits out of their working trees using
git-merge and git-apply to record changes made by others after
reviewing them.  These commands ignore unconflicting local
changes (but notices conflicting ones to operate correctly), and
allow them to make partial commits.  This is a good thing;
otherwise they would have to reset their own changes in their
working tree, only to do merges and to accept patches.  However,
people playing the integrator role rarely have reason to use
"git commit paths..." while merging from others to make such a
partial commit.  Only after they resolve conflicts by hand,
perhaps.  But that happens far less often than careless
individual developers making partial commits of bad kind using
the same "git commit paths..." command.

This is the reason why I feel "git commit paths..." is a bad
feature.  It helps to make bad partial commits, without having
to do much with making good partial commits.

Many SCMs may have the ability to do "commit paths...", but that
does not change the fact that it encourages carelessness for
individual developers, which is especially bad in a distributed
development workflow like the Linux kernel style [*1*].

But that was not my change ;-).

[Foornote]

*1* It could be argued that being able to do partial commit is a
good thing in other SCM systems where there is no equivalent to
our "index" file.  It is one way for the developer to snapshot
their work-in-progress state where they might later come back to
if the approach they are currently pursuing does not pan out.
But for that, we have index file we can "check into" without
committing.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 19:21                 ` Linus Torvalds
@ 2006-01-31 22:55                   ` Joel Becker
  2006-02-01 14:43                     ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Joel Becker @ 2006-01-31 22:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Keith Packard, Johannes Schindelin, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

On Tue, Jan 31, 2006 at 11:21:52AM -0800, Linus Torvalds wrote:
> Now, I do agree. I don't actually like hiding the index too much. 
> Understanding the index is _invaluable_ whenever you're doing a merge with 
> conflicts, and understanding what tools are available to you to resolve 
> those conflicts.

	This is precisely the experience I've had explaining GIT to
folks moving to it.  The simplest workflow (clone; hack one file, commit
one file) is so similar to CVS/Subversion/Anything that it's immediately
understood.  But when pull, push, merge, and any non-linear history are
discussed, I have to describe the index and the commit/tree layout.
Once I do, they get it.

> So I'm actually of the "revel in the index" camp (as could probably be 
> guessed by the original tutorial).

	I'm going to second this, from a real-world "explain it to
others" standpoint.

Joel

-- 

"Every day I get up and look through the Forbes list of the richest
 people in America. If I'm not there, I go to work."
        - Robert Orben

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 10:27           ` Johannes Schindelin
  2006-01-31 15:24             ` Carl Baldwin
  2006-01-31 17:30             ` Linus Torvalds
@ 2006-01-31 23:16             ` Daniel Barkalow
  2006-01-31 23:36               ` Petr Baudis
  2006-01-31 23:47               ` Junio C Hamano
  2 siblings, 2 replies; 110+ messages in thread
From: Daniel Barkalow @ 2006-01-31 23:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Junio C Hamano, Keith Packard, Martin Langhoff,
	Linus Torvalds, Git Mailing List

On Tue, 31 Jan 2006, Johannes Schindelin wrote:

> Hi,
> 
> On Mon, 30 Jan 2006, Carl Baldwin wrote:
> 
> > In general, I think it is grasping the reason for the index file and how 
> > git commands like git-commit and git-diff interact with it.
> 
> IMHO this is the one big showstopper. I had problems explaining the 
> concept myself.
> 
> For example, I had a hard time explaining to a friend why a git-add'ed 
> file is committed when saying "git commit some_other_file", but not 
> another (modified) file. Very unintuitive.

I sort of suspect that "git commit some_other_file" should really read 
HEAD into a temporary index, update "some_other_file" in that (and the 
main index), and commit it. The concept of the index isn't hard (it's the 
preparation you've made so far towards a commit), and plain "git commit" 
makes sense with it; "git commit -a" also makes sense, since committing 
all changes is pretty clear. The surprising thing is that "git commit path 
..." means "everything I've already mentioned, plus path..." not just 
"path ...", and it's particularly surprising because people only tend to 
specify paths when they've done something they don't want to commit.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:16             ` Daniel Barkalow
@ 2006-01-31 23:36               ` Petr Baudis
  2006-01-31 23:47               ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Petr Baudis @ 2006-01-31 23:36 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Linus Torvalds, Git Mailing List

Dear diary, on Wed, Feb 01, 2006 at 12:16:26AM CET, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> said that...
> On Tue, 31 Jan 2006, Johannes Schindelin wrote:
> 
> > Hi,
> > 
> > On Mon, 30 Jan 2006, Carl Baldwin wrote:
> > 
> > > In general, I think it is grasping the reason for the index file and how 
> > > git commands like git-commit and git-diff interact with it.
> > 
> > IMHO this is the one big showstopper. I had problems explaining the 
> > concept myself.
> > 
> > For example, I had a hard time explaining to a friend why a git-add'ed 
> > file is committed when saying "git commit some_other_file", but not 
> > another (modified) file. Very unintuitive.
> 
> I sort of suspect that "git commit some_other_file" should really read 
> HEAD into a temporary index, update "some_other_file" in that (and the 
> main index), and commit it.

FWIW, this is also what cg-commit does.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:16             ` Daniel Barkalow
  2006-01-31 23:36               ` Petr Baudis
@ 2006-01-31 23:47               ` Junio C Hamano
  2006-02-01  0:38                 ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-01-31 23:47 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Linus Torvalds, Git Mailing List

Daniel Barkalow <barkalow@iabervon.org> writes:

> I sort of suspect that "git commit some_other_file" should really read 
> HEAD into a temporary index, update "some_other_file" in that (and the 
> main index), and commit it.
> ...
> The surprising thing is that "git commit path ..." means
> "everything I've already mentioned, plus path..." not just
> "path ...", and it's particularly surprising because people
> only tend to specify paths when they've done something they
> don't want to commit.

Interesting idea, and a good point.

Not that I particularly would like to encourage people to make
partial commits by making it easier, but as long as we allow our
users to say "commit path...", your proposal would reduce the
confusion.

I wonder which is faster, to check if index differs from HEAD
and do the temporary index only when they differ, or always use
a temporary without checking?  The former needs one diff-index
--cached, zero or one read-tree, one write-tree and one
commit-tree.  The latter always needs one read-tree, one
write-tree and one commit-tree.

Wait.  We already do diff-index --cached during git-commit
anyway (it is in git-status).  Maybe with a bit of code
restructuring we can do the temporary index part optional.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 23:47               ` Junio C Hamano
@ 2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
                                     ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  0:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Daniel Barkalow, Johannes Schindelin, Carl Baldwin,
	Keith Packard, Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Junio C Hamano wrote:
> Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> > I sort of suspect that "git commit some_other_file" should really read 
> > HEAD into a temporary index, update "some_other_file" in that (and the 
> > main index), and commit it.
> > ...
> > The surprising thing is that "git commit path ..." means
> > "everything I've already mentioned, plus path..." not just
> > "path ...", and it's particularly surprising because people
> > only tend to specify paths when they've done something they
> > don't want to commit.
> 
> Interesting idea, and a good point.

One thing to be careful about is merges.

This actually happens to me:

	git pull ....

	.. uhhuh, trivial conflict in one file ..
	.. edit the/file/that/conflicted ..

	git commit the/file/that/conflicted

and there is no way that it would ever be correct to then just commit that 
one file. The fact that it's a merge means that the rest of the index - 
which is all from the merge, and correct - absolutely _must_ be committed 
too.

And yes, I could use "git commit -a" (and I often do), but the thing is, I 
surprisingly often have edits in unrelated files (stuff that the merge 
never touched), and doing "git commit -a" would do the wrong thing.

So the current "git commit filename" behaviour is actually the only 
possible correct one for a merge. Nothing else makes any sense 
what-so-ever.

Now, I can hear people arguing that "ok, merges are special, and for 
merges we always do it in the current index", but that makes "git commit 
pathname" act very _differently_ for a merge than for a normal commit. 
That just smells wrong to me.

So if you do this change (which may be the right one) then please make 
sure that "git commit <filename>" doesn't work _at_all_ when a merge is in 
progress (ie MERGE_HEAD exists), because it would do the wrong thing.

And yes, then I'll just have to force my fingers to do a simple

	git-update-index filename
	git commit

instead. I can do that.

Oh, one final suggestion: if you give a filename to "git commit", and you 
do the new semantics which means something _different_ than "do a 
git-update-index on that file and commit", then I'd really suggest that 
the _old_ index for that filename should match the parent exactly. 
Otherwise, you may have done a

	git diff filename

and you _thought_ you were committing just a two-line thing (because you 
didn't understand about the index), but another, earlier, action caused 
the index to be different from the file you had in HEAD, and in reality 
you're actually committing a much bigger diff.

In other words: if you want "git commit <filename>" to _not_ care about 
the current index, then it should make sure that the index at least 
_matches_ the current HEAD in the files mentioned.

Ie "git-diff-index --cached HEAD <filespec>" should return empty. Or 
something like that.

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
@ 2006-02-01  0:52                   ` Junio C Hamano
  2006-02-01  2:19                   ` Daniel Barkalow
  2006-02-01  6:42                   ` Junio C Hamano
  2 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  0:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> One thing to be careful about is merges.
> ...
> So the current "git commit filename" behaviour is actually the only 
> possible correct one for a merge. Nothing else makes any sense 
> what-so-ever.

Agreed 100%, and I kind of feel silly about not mentioning that
myself.  It _might_ even make sense to reject explicit filenames
when MERGE_HEAD does not exist ;-).

> Oh, one final suggestion: if you give a filename to "git
> commit", and you do the new semantics which means something
> _different_ than "do a git-update-index on that file and
> commit", then I'd really suggest that the _old_ index for that
> filename should match the parent exactly.

That is also a good safety measure.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
       [not found]                   ` <20060201013901.GA16832@mail.com>
@ 2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
                                         ` (4 more replies)
  2006-02-01  2:52                     ` Martin Langhoff
  1 sibling, 5 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  2:04 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Ray Lehtiniemi wrote:
> 
> for what it's worth, it's certainly true here...  i'm using git to help
> me manage a similar project where i work.

Hmm.

We _could_ actually fairly easily add a flag to the index which means 
"don't even bother comparing - assume same", and then have specific 
operations to clear that flag.

That would allow people with slow filesystems (not just Windows: even 
under Linux, the cold-cache case is always going to be pretty slow) to 
have a _choice_: they could continue to use git it is done now (explicit 
checks), _or_ they could mark all their index caches as "implicitly 
up-to-date" and use a separate program to mark them as being potentially 
edited.

We still have one unused bit in the cache-entry "ce_flags", so we wouldn't 
even need to break any existing index files with it.

We'd just need to have two new (fast) operations:

 - mark one or more files as being "implicitly up-to-date"

   "git checkout" would do this if the proper flag was set in the 
   .git/config file.

   "git-update-index --refresh" would do this for files that weren't 
   already implicitly up-to-date _and_ the refresh actually showed it to 
   match (and the .git/config file said so).

 - mark one or more files as _not_ being implicitly up-to-date:

   people would do this by hand when editing a file (or when just deciding 
   that they want git to re-check everything again)

They're fast, because they are purely in the cache (well, git-update-index 
obviously isn't, but the new op wouldn't be any _slower_ than the old 
one).

Looks simple enough. The big thing to remember is to clear that 
"implicitly up-to-date" flag whenever we make changes (ie we'd probably 
make "add_cache_entry()" always clear it, possibly with a flag to add it 
as "pre-verified" which would set it).

Comments? Junio, what do you think?

> we're working on a vendor supplied tree which is also hacked upon
> by various VAR companies.  the tree in question has ~20,000 files
> totalling nearly 1.4 GB of source files, ms word docs, binary-only
> libraries for a wide array of processor variants, windows exe
> files, video clips, etc.  (however, the amount of actual source code
> interspersed in there is only about 6000 files totaling about 112 MB)
> 
> here's a repo sitting on the local linux filesystem with cold cache:
> 
>   reiserfs$ time git update-index --refresh
>    real    0m17.422s
>    user    0m0.025s
>    sys     0m0.320s

.. somewhat painful, but with enough memory this is hopefully a pretty 
rare case.

> and with hot cache
> 
>   reiserfs$ time git update-index --refresh
>    real    0m0.151s
>    user    0m0.020s
>    sys     0m0.067s

This is how it _should_ look.

But:

> for comparison, one of our sandboxes is sitting on an NTFS file system,
> accessed via SMB:
> 
>   smbfs$ time git update-index --refresh
>   real    11m36.502s
>   user    0m6.830s
>   sys     0m5.086s

Ouch, ouch, ouch.

Sounds like every single stat() will go out the wire. I forget what the 
Linux NFS client does, but I _think_ it has a metadata timeout that avoids 
this. But it might be as bad under NFS.

Has anybody used git over NFS? If it's this bad (or even close to), I 
guess the "mark files as up-to-date in the index" approach is a really 
good idea..

Of course, the whole point of git is that you should keep your repository 
close, but sometimes NFS - or similar - is enforced upon you by other 
issues, like the fact that the powers-that-be want anonymous workstations 
and everybody should work with a home-directory automounted over NFS..

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
@ 2006-02-01  2:09                       ` Linus Torvalds
  2006-02-09  5:15                         ` [PATCH] "Assume unchanged" git Junio C Hamano
  2006-02-01  2:31                       ` [Census] So who uses git? Junio C Hamano
                                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  2:09 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, Junio C Hamano,
	cworth, Martin Langhoff, Git Mailing List



On Tue, 31 Jan 2006, Linus Torvalds wrote:
> 
> We still have one unused bit in the cache-entry "ce_flags", so we wouldn't 
> even need to break any existing index files with it.

In case it wasn't clear, the _core_ of this optimization would be as 
simple as something like the appended.

The real meat is just making sure that CE_VALID gets set/cleared properly.

(That's also the most complex part, of course, but this trivial patch 
might help show the basic idea)

		Linus

---
diff --git a/cache.h b/cache.h
index bdbe2d6..7adc2e6 100644
--- a/cache.h
+++ b/cache.h
@@ -91,6 +91,7 @@ struct cache_entry {
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_UPDATE    (0x4000)
+#define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
 #define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
diff --git a/read-cache.c b/read-cache.c
index c5474d4..738fe78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -148,7 +148,16 @@ static int ce_match_stat_basic(struct ca
 
 int ce_match_stat(struct cache_entry *ce, struct stat *st)
 {
-	unsigned int changed = ce_match_stat_basic(ce, st);
+	unsigned int changed;
+
+	/*
+	 * If it's marked as always valid in the index, it's 
+	 * valid whatever the checked-out copy says
+	 */
+	if (ce->ce_flags & htons(CE_VALID))
+		return 0;
+
+	changed = ce_match_stat_basic(ce, st);
 
 	/*
 	 * Within 1 second of this sequence:

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
@ 2006-02-01  2:19                   ` Daniel Barkalow
  2006-02-01  6:42                   ` Junio C Hamano
  2 siblings, 0 replies; 110+ messages in thread
From: Daniel Barkalow @ 2006-02-01  2:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Johannes Schindelin, Carl Baldwin, Keith Packard,
	Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Linus Torvalds wrote:

> So if you do this change (which may be the right one) then please make 
> sure that "git commit <filename>" doesn't work _at_all_ when a merge is in 
> progress (ie MERGE_HEAD exists), because it would do the wrong thing.

Agreed. I suppose it could accept doing a commit of only a few files which 
weren't touched by the merge, but I don't think even you multitask enough 
to want to do that; anyway, the user can just ditch the merge, commit 
their stuff, and try the merge again. (I bet this is a case where new 
users would be really surprised by the behavior of "git commit filename", 
except that they wouldn't think it would do anything other than give an 
error.)

> And yes, then I'll just have to force my fingers to do a simple
> 
> 	git-update-index filename
> 	git commit
> 
> instead. I can do that.
>
> Oh, one final suggestion: if you give a filename to "git commit", and you 
> do the new semantics which means something _different_ than "do a 
> git-update-index on that file and commit", then I'd really suggest that 
> the _old_ index for that filename should match the parent exactly. 
> Otherwise, you may have done a
> 
> 	git diff filename
> 
> and you _thought_ you were committing just a two-line thing (because you 
> didn't understand about the index), but another, earlier, action caused 
> the index to be different from the file you had in HEAD, and in reality 
> you're actually committing a much bigger diff.
> 
> In other words: if you want "git commit <filename>" to _not_ care about 
> the current index, then it should make sure that the index at least 
> _matches_ the current HEAD in the files mentioned.
> 
> Ie "git-diff-index --cached HEAD <filespec>" should return empty. Or 
> something like that.

Agreed here, too.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
@ 2006-02-01  2:31                       ` Junio C Hamano
  2006-02-01  3:43                         ` Linus Torvalds
       [not found]                         ` <20060201045337.GC25753@mail.com>
  2006-02-01 16:15                       ` Jason Riedy
                                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  2:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> They're fast, because they are purely in the cache (well, git-update-index 
> obviously isn't, but the new op wouldn't be any _slower_ than the old 
> one).
>
> Looks simple enough. The big thing to remember is to clear that 
> "implicitly up-to-date" flag whenever we make changes (ie we'd probably 
> make "add_cache_entry()" always clear it, possibly with a flag to add it 
> as "pre-verified" which would set it).
>
> Comments? Junio, what do you think?

Somehow this reminds me of a "feature" we added quite a long
time ago to support "update-index without working tree".

I think this should work fine as a mechanism, but I am a bit
worried about the convenience and safety aspect.  It _might_
make sense to do what RCS does; check out read-only copy by
default and set the "assume unchanged" flag, to prevent people
from accidentally modifying the working tree copy without
telling the index about it.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
       [not found]                   ` <20060201013901.GA16832@mail.com>
  2006-02-01  2:04                     ` Linus Torvalds
@ 2006-02-01  2:52                     ` Martin Langhoff
  2006-02-01  3:48                       ` Linus Torvalds
  2006-02-01 14:55                       ` Alex Riesen
  1 sibling, 2 replies; 110+ messages in thread
From: Martin Langhoff @ 2006-02-01  2:52 UTC (permalink / raw)
  To: Ray Lehtiniemi
  Cc: Alex Riesen, Linus Torvalds, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Ray Lehtiniemi <rayl@mail.com> wrote:
> by various VAR companies.  the tree in question has ~20,000 files
> totalling nearly 1.4 GB
...
>   reiserfs$ time git update-index --refresh

If you have such a tree, your workflow _must_ be such that you know
exactly what files you have changed. Asking any tool to go out and
"find which of my 20K files has changed" is doable, but it's just
magic that it works on recent linuxes.

> for comparison, one of our sandboxes is sitting on an NTFS file system,
> accessed via SMB:

you have the samba stack, network, SMB/CIFS stack and NTFS itself in
the middle. Replace the ethernet with carrier pigeons for a more
complete picture ;-)

Perhaps a local git/cygwin on NTFS  would be more reasonable to benchmark?

cheers,

martin

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:31                       ` [Census] So who uses git? Junio C Hamano
@ 2006-02-01  3:43                         ` Linus Torvalds
  2006-02-01  7:03                           ` Junio C Hamano
       [not found]                         ` <20060201045337.GC25753@mail.com>
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  3:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> I think this should work fine as a mechanism, but I am a bit
> worried about the convenience and safety aspect.  It _might_
> make sense to do what RCS does; check out read-only copy by
> default and set the "assume unchanged" flag, to prevent people
> from accidentally modifying the working tree copy without
> telling the index about it.

Yes, I think the "assume unchanged" flag goes well together with making 
sure that the checked-out file is non-writable at the time.

Of course, any number of editors and other actions won't care: if you do 
anything like

	for i in *.c
	do
		sed 's/xyzzy/bas/g' < $i > $i.new
		mv $i.new $i
	done

you'll never have even noticed that the old file was marked read-only. So 
it's obviously not in any way any guarantee, but it probably makes sense 
as a crutch.

Your point that we discussed a similar flag for the "don't require a full 
checkout" is a good one: we should try to make sure that it works for both 
uses. Although maybe we decided for some reason that nobody cared about 
the non-checked-out case?

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:52                     ` Martin Langhoff
@ 2006-02-01  3:48                       ` Linus Torvalds
  2006-02-01 19:30                         ` H. Peter Anvin
  2006-02-01 14:55                       ` Alex Riesen
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  3:48 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Git Mailing List

On Wed, 1 Feb 2006, Martin Langhoff wrote:
> 
> If you have such a tree, your workflow _must_ be such that you know
> exactly what files you have changed. Asking any tool to go out and
> "find which of my 20K files has changed" is doable, but it's just
> magic that it works on recent linuxes.

It's not magic, and it's not all that recent. Linux FS ops have always 
been pretty good, and the dentry cache was introduced in 2.0.x, I think, 
so you'd be hard-pressed to find a Linux system that doesn't have it.

Now, I bet Linux will be better (often by a factor of 2-3) than most other 
systems, but that still doesn't mean that 20k files is totally 
unreasonable on other setups. 

I suspect cygwin is worse than most because (a) the NT VFS layer is 
piss-poor and you need a kernel service to get good performance and (b) 
cygwin probably adds its own overhead for handling symlinks, so the 
"lstat()" call is probably even more expensive.

Now, the networked filesystems are a potential problem for everybody.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
       [not found]                         ` <20060201045337.GC25753@mail.com>
@ 2006-02-01  5:04                           ` Linus Torvalds
  2006-02-01  5:42                           ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01  5:04 UTC (permalink / raw)
  To: Ray Lehtiniemi; +Cc: Junio C Hamano, git



On Tue, 31 Jan 2006, Ray Lehtiniemi wrote:
> 
> what if the user wants to change the mode bits of an assume-unchanged
> file with the twiddled permissions, but forgets to clear the flag
> first?  seems like that change is likely to get lost, especially if the
> new mode is read-only....

Remember - git only cares about execute permissions. The write permissions 
are entirely ignored by git ..

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
       [not found]                         ` <20060201045337.GC25753@mail.com>
  2006-02-01  5:04                           ` Linus Torvalds
@ 2006-02-01  5:42                           ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  5:42 UTC (permalink / raw)
  To: Ray Lehtiniemi; +Cc: Linus Torvalds, git

Ray Lehtiniemi <rayl@mail.com> writes:

> what if the user wants to change the mode bits of an assume-unchanged
> file with the twiddled permissions, but forgets to clear the flag
> first?  seems like that change is likely to get lost, especially if the
> new mode is read-only....

No problem, since we only record u+x bit and nothing else.  Most
importantly, we do not record any of the +w bits.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  0:38                 ` Linus Torvalds
  2006-02-01  0:52                   ` Junio C Hamano
  2006-02-01  2:19                   ` Daniel Barkalow
@ 2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
                                       ` (2 more replies)
  2 siblings, 3 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  6:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> Oh, one final suggestion: if you give a filename to "git commit", and you 
> do the new semantics which means something _different_ than "do a 
> git-update-index on that file and commit", then I'd really suggest that 
> the _old_ index for that filename should match the parent exactly. 
> Otherwise, you may have done a
>
> 	git diff filename
>
> and you _thought_ you were committing just a two-line thing (because you 
> didn't understand about the index), but another, earlier, action caused 
> the index to be different from the file you had in HEAD, and in reality 
> you're actually committing a much bigger diff.

This "I thought I was only checking in the two-liner I did as
the last step but you committed the whole thing, stupid git!"
confusion feels to be a parallel of "I thought I was only
checking in the files I specified on the command line but you
also committed the files I earlier git-add'ed, stupid git!"
confusion.

Taken together with your "during a partially conflicted merge"
example, it feels to me that the simplest safety valve would be
to refuse "git commit paths..." if the index does not exactly
match HEAD.  Not just mentioned paths but anywhere.

People who do not like this can set in their config file some
flag, say, 'core.index = understood', to get the current
behaviour.

The reason I am bringing this up is because of this command
sequence:

	# start from a clean tree, after 'git reset --hard'
        $ create a-new-file
        $ git add a-new-file
        $ edit existing-file
        $ edit another-file
        $ git commit existing-file

There is no question we do not commit "another-file" and we do
commit changes to the "existing-file" as a whole.  What should
we do to "a-new-file", and how do we explain why we do so to
novices?

We can argue it either way.  We could say we shouldn't because
"commit" argument does not mention it.  We could say we should
because the user already told that he wants to add that file to
git.  Either makes sort-of sense from what the end user did.

I think a file "cvs add"ed is committed if whole subdirectory
commit (similar to our "commit -a") is done or the file is
explicitly specified on the "cvs commit" command line, and that
may match people's expectations.  That's an argument for not
committing "a-new-file".  But to be consistent with that, this
should not commit anything:

        # the same clean tree.
	$ create a-new-file
        $ git add a-new-file
        $ git commit

Which is counterintuitive to me by now (because I played too
long with git).

We could make "git commit" without paths to mean the current
"-a" behaviour, which would match CVS behaviour more closely.
However, it would make commit after a merge conflict resolution
in a dirty working tree _very_ dangerous -- it may give more
familiar feel to CVS people, but it is not an improvement for
git people at all.  I would rather not.

Right now, "git add" means "stage this for the next commit in
the index".  If we change the semantics of "git add" to mean "I
am not adding it for the next commit yet; I am just letting you
know there is a file in the working tree so that you can keep an
eye on it for me", using the intent-to-add index entry I've
mentioned a couple of times, I think the above problem might
naturally be solved.  For people who do not use update-index,
"commit -a" and "commit paths..." are the only two ways to
actually check-in anything to the index file for the next
commit ("git add" alone does not count).  "commit -a" would do
the equivalent of current "update all the not-up-to-date file to
the index and then commit", which would include the intent-to-add
paths.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  3:43                         ` Linus Torvalds
@ 2006-02-01  7:03                           ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  7:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> Your point that we discussed a similar flag for the "don't require a full 
> checkout" is a good one: we should try to make sure that it works for both 
> uses. Although maybe we decided for some reason that nobody cared about 
> the non-checked-out case?

We gave them a way to add --cacheinfo but did not do any more
than that, because they are independently coming up with some
hash (not necessarily be a proper git blob object name), they
did not have the huge blob data with the working tree anyway,
and the only thing they cared about was which paths changed and
they did not even want to see how the contents changed.
I.e. "diff-tree -r" was the only thing they cared about.

If we end up doing "assume unchanged", I should remember to do a
sensible thing for "diff-index" without --cached.  It should not
look at the working tree file for paths marked as such.  This
implies one optimization in "diff-index -p" and "diff-tree -p"
may need to be disabled.  They cheat and avoid expanding blob
objects when their cache entries are clean and required blobs
are in the working tree.  If "assume unchanged" path was
actually changed, such a diff would show up as a confusing
unexpected change.

Well, the user is asking for it, so that confusion is not _my_
problem, though ;-).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
@ 2006-02-01  7:22                     ` Carl Worth
  2006-02-01  8:26                       ` Junio C Hamano
  2006-02-01 17:11                     ` Linus Torvalds
  2006-02-01 17:18                     ` Nicolas Pitre
  2 siblings, 1 reply; 110+ messages in thread
From: Carl Worth @ 2006-02-01  7:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On Tue, 31 Jan 2006 22:42:05 -0800, Junio C Hamano wrote:
> 
> There is no question we do not commit "another-file" and we do
> commit changes to the "existing-file" as a whole.  What should
> we do to "a-new-file", and how do we explain why we do so to
> novices?

I'll offer a couple of ill-informed comments from a novice's
point-of-view if I may.

My first exposure to git (about 1 week ago) was "A short git
tutorial" [*]

I found the discussion of the index, git-update-index, and the subtle
distinctions between the various git-diff commands rather intimidating
for an initial introduction. After getting to know the system better
over the past week, it seems it should be possible to have a class of
"novice ready" tools that provide for common use cases and that never
require any mention of the index in their documentation. If so, that
seems to me a useful goal to work toward and a useful guide in this
discussion.

> We could make "git commit" without paths to mean the current
> "-a" behaviour, which would match CVS behaviour more closely.

Again, my novice experience leads me to favor that change. After
reading the tutorial, I had the following sequence in mind for
committing an edited file:

	git update-index edited-file
	git commit

which seemed like more pain than strictly necessary. The next day,
when I went to the linux.conf.au tutorial and saw Linus use:

	git commit -a

for the same operation it was a breath of fresh air. I was left
scratching my head wondering why the -a behavior wasn't the default
for "git commit" with no paths.

> However, it would make commit after a merge conflict resolution
> in a dirty working tree _very_ dangerous -- it may give more
> familiar feel to CVS people, but it is not an improvement for
> git people at all.  I would rather not.

I'm still not "git people" I guess. Could you explain what the danger
is here? And is it something the tool could detect and prevent?

-Carl

[*] http://www.kernel.org/pub/software/scm/git/docs/core-tutorial.html [*

A better initial introduction for me would likely have been "A
tutorial introduction to git":

http://www.kernel.org/pub/software/scm/git/docs/tutorial.html

so a link to the latter from the first paragraph or so of the former
might be very helpful.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:22                     ` Carl Worth
@ 2006-02-01  8:26                       ` Junio C Hamano
  2006-02-01  9:59                         ` Randal L. Schwartz
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  8:26 UTC (permalink / raw)
  To: Carl Worth; +Cc: Linus Torvalds, git

Carl Worth <cworth@cworth.org> writes:

> ... it seems it should be possible to have a class of
> "novice ready" tools that provide for common use cases and that never
> require any mention of the index in their documentation. If so, that
> seems to me a useful goal to work toward and a useful guide in this
> discussion.

I agree it is a worthy goal.  Unfortunately I lost my git
virginity long time ago, so a fresh perspective is really
appreciated in this discussion.

> ... Could you explain what the danger is here?

As Linus mentioned in an earlier message in this thread, one of
the important task for him is to take other peoples' trees and
merge it into his mainline.  The workflow goes like this:

	$ git pull from-somewhere
        ... oops there are conflicts
        $ edit conflicted/file
        $ edit more/conflicted/file
        ... maybe compile test ...
	$ git diff -c ;# final sanity check
        $ git update-index conflicted/file
        $ git update-index more/conflicted/file
        $ git commit

He does *not* want to do "git commit -a" here, because he
usually has unrelated changes in his working tree he has not
done update-index on and does _not_ want to commit [*1*].  "git
commit" to imply "git commit -a" increases the risk of
accidentally committing those unrelated changes mixed in the
merge (eh, actually makes the risk 100%).

We _could_ detect that we were in the middle of a merge,
enumerate the paths touched by the merged branches.  Then we can
say paths that are different between the index and the working
tree and not in the paths touched by the merge are his unrelated
changes.  But it is conceivable he may need to modify a file
neither branch touches in order to _logically_ resolve the
merge, even when the merge phisically does not conflict in
textual diff basis, so while that heuristics may work pretty
well most of the time, doing so might make things even less
easier to explain to other people.

[Footnotes]

*1* The reason he has unrelated changes while doing a merge is
because he works on things himself (I am speculating about
this), and for these modified paths he never runs git-add nor
git-update-index until he is ready to commit his changes (I am
not speculating about this).  As long as he knows what he is
pulling in from outside does not overlap with what he has been
working on, he can merge and commit the result without worrying
about his own unrelated changes, and git is careful not to touch
anything in his working tree to cause information loss when the
changes do overlap [*2*].

He is committing something that he never tested himself in his
working tree as a whole.  The tree resulting from the merge
never existed outside his index file, so there is no way he
could have even compile tested it properly.  But for somebody
who is playing an integrator's role, it is not his primary job
to examine and test every change he merges in as a whole at
nitty-gritty level -- that is what the originator of the change
should have done.  So having uncommitted changes in the working
tree for an integrator person is not a sign of bad discipline at
all, and supporting this workflow _is_ important for git.

The primary reason I first got involved in git was because I
wanted to help the workflow of the kernel people, especially
Linus and the subsystem maintainers.  To be honest, I personally
still consider the kernel people the first tier customers for
me, and I stop and try to think twice when thinking about a
change or a new feature that may help individual developers and
newcomers, to make sure such a change does not make life less
convenient for the 'integrator' people.  Helping integrators to
be more efficient is important because they can become
bottlenecks.

*2* I once got yelled at by Linus when I carelessly broke this
feature and changed 'git-merge' to require a clean working tree
without changes before starting a merge; it was quickly
reverted.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  8:26                       ` Junio C Hamano
@ 2006-02-01  9:59                         ` Randal L. Schwartz
  2006-02-01 20:48                           ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Randal L. Schwartz @ 2006-02-01  9:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carl Worth, Linus Torvalds, git

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> *1* The reason he has unrelated changes while doing a merge is
Junio> because he works on things himself (I am speculating about
Junio> this),

You need to speculate that Linus works on things himself? :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 22:55                   ` Joel Becker
@ 2006-02-01 14:43                     ` Johannes Schindelin
  0 siblings, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2006-02-01 14:43 UTC (permalink / raw)
  To: Joel Becker
  Cc: Linus Torvalds, Keith Packard, Carl Baldwin, Junio C Hamano,
	Martin Langhoff, Git Mailing List

Hi,

On Tue, 31 Jan 2006, Joel Becker wrote:

> On Tue, Jan 31, 2006 at 11:21:52AM -0800, Linus Torvalds wrote:
> > Now, I do agree. I don't actually like hiding the index too much. 
> > Understanding the index is _invaluable_ whenever you're doing a merge with 
> > conflicts, and understanding what tools are available to you to resolve 
> > those conflicts.
> 
> 	This is precisely the experience I've had explaining GIT to
> folks moving to it.  The simplest workflow (clone; hack one file, commit
> one file) is so similar to CVS/Subversion/Anything that it's immediately
> understood.  But when pull, push, merge, and any non-linear history are
> discussed, I have to describe the index and the commit/tree layout.
> Once I do, they get it.
> 
> > So I'm actually of the "revel in the index" camp (as could probably be 
> > guessed by the original tutorial).
> 
> 	I'm going to second this, from a real-world "explain it to
> others" standpoint.

How about talking about the index a bit at the end of tutorial.txt like 
this:

-- snip --
For a number of (mostly technical) reasons, "git diff" does not show the 
changes of the current working directory with respect to the latest 
commit, but rather to an intermediate stage: the "index".

Think of the index as a staging area just before committing: the commit 
object (and the tree and blob objects referenced from it) are assembled 
there.

Also, when you checkout, the index is used to disassemble the commit 
object just before writing the corresponding files and directories.
-- snap --

May this be worth the work?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:52                     ` Martin Langhoff
  2006-02-01  3:48                       ` Linus Torvalds
@ 2006-02-01 14:55                       ` Alex Riesen
  2006-02-01 16:25                         ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Alex Riesen @ 2006-02-01 14:55 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Ray Lehtiniemi, Linus Torvalds, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Perhaps a local git/cygwin on NTFS  would be more reasonable to benchmark?

$ time git update-index --refresh

real    0m21.500s
user    0m0.358s
sys     0m1.406s

WinNT, NTFS, 13k files, hot cache.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
  2006-02-01  2:09                       ` Linus Torvalds
  2006-02-01  2:31                       ` [Census] So who uses git? Junio C Hamano
@ 2006-02-01 16:15                       ` Jason Riedy
  2006-02-01 19:20                       ` Julian Phillips
  2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 0 replies; 110+ messages in thread
From: Jason Riedy @ 2006-02-01 16:15 UTC (permalink / raw)
  Cc: Git Mailing List

And Linus Torvalds writes:
 - 
 - Has anybody used git over NFS? If it's this bad (or even close to), I 
 - guess the "mark files as up-to-date in the index" approach is a really 
 - good idea..

My normal use is on NFS (Solaris and Linux) and IBM's GPFS 
(AIX and Linux).  I haven't noticed any particular problems, 
and LAPACK and the reference BLAS make a moderately sized 
working set of around 3000 source files.  Not kernel sized, 
but not tiny.

However, I mostly use git over NFS on a relatively slow 
machine.  NFS is faster than the local disk...

Jason

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 14:55                       ` Alex Riesen
@ 2006-02-01 16:25                         ` Linus Torvalds
  2006-02-02  9:12                           ` Alex Riesen
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 16:25 UTC (permalink / raw)
  To: Alex Riesen
  Cc: Martin Langhoff, Ray Lehtiniemi, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List

On Wed, 1 Feb 2006, Alex Riesen wrote:
> 
> $ time git update-index --refresh
> 
> real    0m21.500s
> user    0m0.358s
> sys     0m1.406s
> 
> WinNT, NTFS, 13k files, hot cache.

That's 25% less files than the Linux kernel, and I can do that operation 
in 0m0.062s (0.012s user, 0.048s system).

So WinNT/cygwin is about 2.5 _orders_of_maginitude_ slower here, or 340 
times slower.

Now, I'm tempted to say that NT is a piece of sh*t, but the fact is, your 
CPU-times seem to indicate that most of it is IO (and the "real" cost is 
just 1.7 seconds, much of which is system time, which in turn itself is 
probably due to the IO costs too - so even that isn't comparable with 
the ).

Which may mean that you simply don't have enough memory to cache the whole 
thing. Which may be NT sucking, of course ("we don't like to use more than 
10% of memory for caches"), but it might also be a tunable (which is sucky 
in itself, of course), but finally, it might just be that you just don't 
have a ton of memory. I've got 2GB in my machines, although 1GB is plenty 
to cache the kernel.

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
@ 2006-02-01 17:11                     ` Linus Torvalds
  2006-02-01 17:18                     ` Nicolas Pitre
  2 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 17:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> Taken together with your "during a partially conflicted merge"
> example, it feels to me that the simplest safety valve would be
> to refuse "git commit paths..." if the index does not exactly
> match HEAD.  Not just mentioned paths but anywhere.

But at that point, the existing "git commit" semantics actually are the 
ones we'd use, and the only difference ends up being that we error out 
if the index doesn't match HEAD.

The problem with that is that it appears that some of the people who don't 
like the current "git commit <filename>" thing _do_ actually understand 
the index, but they want to commit just that one file. 

So at least from my understanding, I think Dscho was arguing for the new 
semantics of "git commit <file>" to _work_, but to only commit <file>, 
even if he does understand the index perfectly well, and might have done a 
"git add" or updated a file for some other reason..

Btw, one thing that _can_ be confusing is that you do

	git commit fileA

and then when you edit the commit message, you realize that you don't 
actually want to do this at all, so you exit out of the editor without 
changes (which aborts the commit). Now "git commit" will not actually have 
done the commit, but it _will_ have done the "git-update-index" on that 
file.

So next time, when you do

	git commit fileB

you'll currently commit _both_ fileA and fileB.

This is, in my opinion, the biggest argument for the suggested _new_ 
semantics: if you explicitly name a set of files, it should always do a

	# Verify current state
	parent=$(git-rev-parse --verify HEAD) || exit

	# Verify that the current index is ok in the named files
	a=$(git-diff-files --name-only --cached $parent "$@") || exit
	if [ "$a" ]; then
	   echo -e >&2 "Files are changed in the index:\n  $a"
	   exit 2
	fi

	# create the new tree object
	export GIT_INDEX_FILE=tmpfile
	newtree=$(git-read-tree $parent &&
	  git-update-index "$@" &&
	  git-write-tree) || exit

	# edit message
	... edit message ..

	# do commit
	newhead=$(git-commit-tree -p $parent < msg)
	git-update-ref HEAD $newhead $parent

or similar. That has the advantage that if we _do_ decide to break out of 
the commit, we will not have changed the current index (only the temporary 
one).

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  6:42                   ` Junio C Hamano
  2006-02-01  7:22                     ` Carl Worth
  2006-02-01 17:11                     ` Linus Torvalds
@ 2006-02-01 17:18                     ` Nicolas Pitre
  2006-02-01 20:27                       ` Junio C Hamano
  2 siblings, 1 reply; 110+ messages in thread
From: Nicolas Pitre @ 2006-02-01 17:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> This "I thought I was only checking in the two-liner I did as
> the last step but you committed the whole thing, stupid git!"
> confusion feels to be a parallel of "I thought I was only
> checking in the files I specified on the command line but you
> also committed the files I earlier git-add'ed, stupid git!"
> confusion.
> 
> Taken together with your "during a partially conflicted merge"
> example, it feels to me that the simplest safety valve would be
> to refuse "git commit paths..." if the index does not exactly
> match HEAD.  Not just mentioned paths but anywhere.
> 
> People who do not like this can set in their config file some
> flag, say, 'core.index = understood', to get the current
> behaviour.

I'd avoid hidden config options that magically change behaviors and 
semantics like that as much as possible.  _This_ would pave the way to 
even greater confusion and prevent the git user base from converging on 
a unified semantics knowledge.  Better add a command line option which 
has the vertue of being visible, and name it such that it make the 
intention explicit whether the previous index state is preserved or not,
something like --current-index or the like.

> The reason I am bringing this up is because of this command
> sequence:
> 
> 	# start from a clean tree, after 'git reset --hard'
>         $ create a-new-file
>         $ git add a-new-file
>         $ edit existing-file
>         $ edit another-file
>         $ git commit existing-file
> 
> There is no question we do not commit "another-file" and we do
> commit changes to the "existing-file" as a whole.  What should
> we do to "a-new-file", and how do we explain why we do so to
> novices?
> 
> We can argue it either way.  We could say we shouldn't because
> "commit" argument does not mention it.  We could say we should
> because the user already told that he wants to add that file to
> git.  Either makes sort-of sense from what the end user did.

It is much more intuitive to expect that, if you specify path arguments 
to commit, then only those paths are considered, and even if you didn't 
do a git add on some of them.  If nothing is specified then the current 
index (the default, including a-new-file) is considered.

> I think a file "cvs add"ed is committed if whole subdirectory
> commit (similar to our "commit -a") is done or the file is
> explicitly specified on the "cvs commit" command line, and that
> may match people's expectations.  That's an argument for not
> committing "a-new-file".

Exact.

> But to be consistent with that, this should not commit anything:
> 
>         # the same clean tree.
> 	$ create a-new-file
>         $ git add a-new-file
>         $ git commit
> 
> Which is counterintuitive to me by now (because I played too
> long with git).

IMHO this should commit a_new_file simply because you added it to the 
index and a commit without any argument should commit the whole 
(refreshed) index.

> We could make "git commit" without paths to mean the current
> "-a" behaviour, which would match CVS behaviour more closely.

Exact.

> However, it would make commit after a merge conflict resolution
> in a dirty working tree _very_ dangerous -- it may give more
> familiar feel to CVS people, but it is not an improvement for
> git people at all.  I would rather not.

For that case, (assuming that -a would be the default) maybe something 
meaning the opposite of -a could be specified on the commit argument 
list like I suggested earlier.  And maybe it should always be the 
default when committing a merge (in which case the -a would override 
that and refresh everything and not only the merged files plus those 
specified on the command line).

So to resume:

 - a non-merge commit without any argument would imply -a.

 - a non-merge commit with path arguments implies _only_ those paths, 
   regardless if they were previously "git add"ed or not.

 - a non-merge commit with, say, --no-auto or --current-index or 
   whatever would preserve the current behavior, with or without 
   additional paths.

 - a merge commit would imply that --no-auto behavior automatically.

 - a merge commit could override the --no-auto with an explicit -a.

This might look complicated when presented like that, but I think that 
the default behavior of each (non-merge vs merge) commit would more 
closely fit most people's expectations.  The merge commit create a shift 
in semantics of course, but committing a merge is already something a 
bit more involved anyway and at that point git users should have gained 
a bit more experience with the index concept and the default merge 
behavior is probably what most people will expect at that point as well.

Nicolas

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2006-02-01 16:15                       ` Jason Riedy
@ 2006-02-01 19:20                       ` Julian Phillips
  2006-02-01 19:29                         ` Linus Torvalds
  2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 1 reply; 110+ messages in thread
From: Julian Phillips @ 2006-02-01 19:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List

On Tue, 31 Jan 2006, Linus Torvalds wrote:

> Sounds like every single stat() will go out the wire. I forget what the
> Linux NFS client does, but I _think_ it has a metadata timeout that avoids
> this. But it might be as bad under NFS.
>
> Has anybody used git over NFS? If it's this bad (or even close to), I
> guess the "mark files as up-to-date in the index" approach is a really
> good idea..

As it happens, yes ... I can't say that I've noticed git being 
particularly slow, but then - I've not tried running git with a local 
repos ... ;)

using a recentish 2.6 kernel repos, directly on the server I get:

server: linux-2.6>time git update-index --refresh

real    0m0.067s
user    0m0.015s
sys     0m0.052s

then against the same repos over NFS, I get:

client: linux-2.6>time git update-index --refresh

real    0m1.578s
user    0m0.018s
sys     0m0.366s

and if I do it from the client again soon afterward I get:

client: linux-2.6>time git update-index --refresh

real    0m0.145s
user    0m0.012s
sys     0m0.118s

>
> Of course, the whole point of git is that you should keep your repository
> close, but sometimes NFS - or similar - is enforced upon you by other
> issues, like the fact that the powers-that-be want anonymous workstations
> and everybody should work with a home-directory automounted over NFS..
>

-- 
Julian

  ---
You know it's going to be a bad day when you want to put on the clothes
you wore home from the party and there aren't any.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 19:20                       ` Julian Phillips
@ 2006-02-01 19:29                         ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 19:29 UTC (permalink / raw)
  To: Julian Phillips
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List

On Wed, 1 Feb 2006, Julian Phillips wrote:
> 
> As it happens, yes ... I can't say that I've noticed git being particularly
> slow, but then - I've not tried running git with a local repos ... ;)

Well, NFS seems to be ok. Which is not that surprising: NFS has gotten a 
_lot_ of attention in the caching area (I worked on it myself a couple of 
years back when the page cache transition happened during 2.3.x, but 
happily we've had very good NFS maintainership since, so I don't get 
involved any more).

Your numbers show that NFS is fine (my "benchmark" is that I refuse to see 
the kinds of commit times that "cvs commit" does - easily several minutes 
for a big project. If it goes over 2 seconds, it's painful, and over ten 
seconds is totally unacceptable).

Your numbers seem to say that at least with a good network/server, NFS on 
Linux is not a problem at all.

CIFS is likely a very different animal. I suspect the cifs people have 
spent a whole lot more effort on strange Windows interaction issues than 
on trying to make sure that cached performance is top-notch.

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  3:48                       ` Linus Torvalds
@ 2006-02-01 19:30                         ` H. Peter Anvin
  0 siblings, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Ray Lehtiniemi, Alex Riesen,
	Radoslaw Szkodzinski, Keith Packard, Junio C Hamano, cworth,
	Git Mailing List

Linus Torvalds wrote:
> 
> It's not magic, and it's not all that recent. Linux FS ops have always 
> been pretty good, and the dentry cache was introduced in 2.0.x, I think, 
> so you'd be hard-pressed to find a Linux system that doesn't have it.
> 

2.1.14, I seem to remember -- it was definitely 2.1.1x-ish.  I mostly 
recall because autofs didn't just break horribly, it took adding several 
dcache hooks to make it work again :)

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-30 18:58         ` Carl Baldwin
  2006-01-31 10:27           ` Johannes Schindelin
@ 2006-02-01 19:32           ` H. Peter Anvin
  1 sibling, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:32 UTC (permalink / raw)
  To: Carl Baldwin
  Cc: Junio C Hamano, Keith Packard, Martin Langhoff, Linus Torvalds,
	Git Mailing List

Carl Baldwin wrote:
> 
> - Anyone can install and fire it up without license/contract hassles.
> 

For something like an SCM this is a big deal, and not just for the Open 
Source world.  In a company, it means not having to worry about having 
enough licenses, and getting budget approval, etc, etc, before a new 
person can join a project.  Perhaps more importantly, it allows someone 
who normally isn't *on* the project to look at it and participate.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-01-31 17:30             ` Linus Torvalds
  2006-01-31 18:12               ` J. Bruce Fields
  2006-01-31 19:01               ` Keith Packard
@ 2006-02-01 19:34               ` H. Peter Anvin
  2 siblings, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2006-02-01 19:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Carl Baldwin, Junio C Hamano, Keith Packard,
	Martin Langhoff, Git Mailing List

Linus Torvalds wrote:
>>
>>For example, I had a hard time explaining to a friend why a git-add'ed 
>>file is committed when saying "git commit some_other_file", but not 
>>another (modified) file. Very unintuitive.
> 
> I really think you should explain it one of two ways:
> 
>  - ignore it. Never _ever_ use git-update-index directly, and don't tell 
>    people about use individual filenames to git-commit. Maybe even add 
>    "-a" by default to the git-commit flags as a special installation 
>    addition.
> 
>  - talk about the index, and revel in it as a way to explain the staging 
>    area. This is what the old tutorial.txt did before it got simplified.
> 
> The "ignore the index" approach is the simple one to explain. It's 
> strictly less powerful, but hey, what else is new? 
> 

I think both of these are probably the wrong answer, and it's pretty 
much a matter of the git model violating the principle of least 
surprise.  Perhaps added (or removed?) files need to be handled in a 
different way than they currently are.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 17:18                     ` Nicolas Pitre
@ 2006-02-01 20:27                       ` Junio C Hamano
  2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 22:00                         ` Joel Becker
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01 20:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git

Nicolas Pitre <nico@cam.org> writes:

> On Tue, 31 Jan 2006, Junio C Hamano wrote:
>
>> People who do not like this can set in their config file some
>> flag, say, 'core.index = understood', to get the current
>> behaviour.
>
> I'd avoid hidden config options that magically change behaviors and 
> semantics like that as much as possible....

I agree; it was tongue-in-cheek sort of suggestion ;-)

> It is much more intuitive to expect that, if you specify path arguments 
> to commit, then only those paths are considered, and even if you didn't 
> do a git add on some of them.  If nothing is specified then the current 
> index (the default, including a-new-file) is considered.

Good thinking.  I was not thinking about the case where you
explicitly list an untracked file to be added.

>  - a non-merge commit without any argument would imply -a.
>
>  - a non-merge commit with path arguments implies _only_ those paths, 
>    regardless if they were previously "git add"ed or not.
>
>  - a non-merge commit with, say, --no-auto or --current-index or 
>    whatever would preserve the current behavior, with or without 
>    additional paths.
>
>  - a merge commit ...
>  - a merge commit ...
>
> This might look complicated when presented like that, but I think that 
> the default behavior of each (non-merge vs merge) commit would more 
> closely fit most people's expectations....

If I may correct what I said earlier, I now realize the
"automatic -a is dangerous" argument does not have anything to
do with merges.  If the user usually works with a dirty working
tree, is aware of the index, and takes advantage of the index as
the staging area for the next commit, your --no-auto would be
needed to help her workflow.  I in principle agree with the
first three items in the above summary, except that I think it
would make more sense to do that for all commits.

How about this:

 - "git commit --also fileA..." means: update index at listed
   paths (add/remove if necessary) and then commit the tree
   described in index (the current behaviour with explicit paths).

 - "git commit fileA..." means: create a temporary index from the
   current HEAD commit (or empty index if there is none), update
   it at listed paths (add/remove if necessary) and commit the
   resulting tree.  Also update the real index at the listed
   paths (add/remove if necessary).  In the original index file,
   the paths listed must be either empty or match exactly the
   HEAD commit -- otherwise we error out (Linus' suggestion).

 - "git commit" means: update index with all local changes and
   then commit the tree described in index (current "-a"
   behaviour).

 - In all cases, revert the index to the state before the
   command is run if we end up not making the commit (e.g. index
   unmerged, empty log message, pre-commit hook refusal).

Experienced git users would end up saying "--also" without
explicit paths to defeat the automatic -a behaviour all the
time, and while the flag --also makes perfect sense when used
with one or more paths, using it like this look awkward:

        $ edit some-file
        $ git update-index some-file
        $ git commit --also

It's just a flag name so we could make --no-auto synonym to --also.

A minor twist of the above to make it friendlier to the current
git users is to do this:

 - "git commit fileA...", "git commit -a", and "git commit" keep
   the existing semantics.

 - "git commit --only fileA..." does the new temporary index
   thing.

This has an advantage that existing use is not affected, and
another advantage is that internally it is more consistent ("git
commit" is a natural extension of "git commit fileA..." with
zero path).  But one possible downside is that you need to
explicitly say --only when you want cvs-like "commit".

Since we are discussing that the people find existing
interface to be unintuitive, being consistent with the current
usage may not count as a big advantage after all..

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  9:59                         ` Randal L. Schwartz
@ 2006-02-01 20:48                           ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01 20:48 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: git

merlyn@stonehenge.com (Randal L. Schwartz) writes:

>>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:
>
> Junio> *1* The reason he has unrelated changes while doing a merge is
> Junio> because he works on things himself (I am speculating about
> Junio> this),
>
> You need to speculate that Linus works on things himself? :)

Forgot a smiley ;-).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 20:27                       ` Junio C Hamano
@ 2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 21:34                           ` Nicolas Pitre
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:00                         ` Joel Becker
  1 sibling, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 21:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git



On Wed, 1 Feb 2006, Junio C Hamano wrote:
> 
> How about this:
> 
>  - "git commit --also fileA..." means: update index at listed
>    paths (add/remove if necessary) and then commit the tree
>    described in index (the current behaviour with explicit paths).

I'd suggest "--incremental" instead of "--also".

>  - "git commit fileA..." means: create a temporary index from the
>    current HEAD commit (or empty index if there is none), update
>    it at listed paths (add/remove if necessary) and commit the
>    resulting tree.  Also update the real index at the listed
>    paths (add/remove if necessary).  In the original index file,
>    the paths listed must be either empty or match exactly the
>    HEAD commit -- otherwise we error out (Linus' suggestion).

Yes.

>  - "git commit" means: update index with all local changes and
>    then commit the tree described in index (current "-a"
>    behaviour).

No. Please no. "git commit" should continue to do what it does now. 
Otherwise you can't do the two-stage thing in any sane way.

Requiring "--incremental"/"--also" is very confusing.

If somebody doesn't know about the index, he normally will never have 
index changes _anyway_, except for the "git add" case. In which case "git 
commit" does the right thing for him: it will either commit the added 
files, or it will say "nothing to commit".

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:09                         ` Linus Torvalds
@ 2006-02-01 21:34                           ` Nicolas Pitre
  2006-02-01 21:59                           ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Nicolas Pitre @ 2006-02-01 21:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 1 Feb 2006, Linus Torvalds wrote:

> 
> 
> On Wed, 1 Feb 2006, Junio C Hamano wrote:
> > 
> > How about this:
> > 
> >  - "git commit --also fileA..." means: update index at listed
> >    paths (add/remove if necessary) and then commit the tree
> >    described in index (the current behaviour with explicit paths).
> 
> I'd suggest "--incremental" instead of "--also".
> 
> >  - "git commit fileA..." means: create a temporary index from the
> >    current HEAD commit (or empty index if there is none), update
> >    it at listed paths (add/remove if necessary) and commit the
> >    resulting tree.  Also update the real index at the listed
> >    paths (add/remove if necessary).  In the original index file,
> >    the paths listed must be either empty or match exactly the
> >    HEAD commit -- otherwise we error out (Linus' suggestion).
> 
> Yes.

Agreed.

> >  - "git commit" means: update index with all local changes and
> >    then commit the tree described in index (current "-a"
> >    behaviour).
> 
> No. Please no. "git commit" should continue to do what it does now. 
> Otherwise you can't do the two-stage thing in any sane way.
> 
> Requiring "--incremental"/"--also" is very confusing.
> 
> If somebody doesn't know about the index, he normally will never have 
> index changes _anyway_, except for the "git add" case. In which case "git 
> commit" does the right thing for him: it will either commit the added 
> files, or it will say "nothing to commit".

Sensible.  As long as "commit files..." actually commits _only_ those 
files unless --index (or something) is specified to also explicitly 
include the index changes.

What is really counter-intuitive is to have index changes merged by 
default when a single file is specified as argument to commit.


Nicolas

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:09                         ` Linus Torvalds
  2006-02-01 21:34                           ` Nicolas Pitre
@ 2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
                                               ` (2 more replies)
  1 sibling, 3 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, git

Linus Torvalds <torvalds@osdl.org> writes:

>>  - "git commit" means: update index with all local changes and
>>    then commit the tree described in index (current "-a"
>>    behaviour).
>
> No. Please no. "git commit" should continue to do what it does now. 
> Otherwise you can't do the two-stage thing in any sane way.
> Requiring "--incremental"/"--also" is very confusing.

I myself did not like it but...

> If somebody doesn't know about the index, he normally will never have 
> index changes _anyway_, except for the "git add" case. In which case "git 
> commit" does the right thing for him: it will either commit the added 
> files, or it will say "nothing to commit".

... the original complaint was that "git commit" without
explicit paths does not quack like "cvs/svn commit" -- commit
all my changes in the working tree.

And actually the one you are responding to was my cunning move
to pull this exact reaction from you: "No commit without
parameter should not imply -a".  I prefer the "minor twist"
version in the same messge myself.

To recap:

 - "git commit fileA..." means: update index at listed paths
   (add/remove if necessary) and then commit the tree described
   in index (the same as the current behaviour with explicit
   paths).

 - "git commit -a" means: update index with all local changes and
   then commit the tree described in index (the same as the
   current behaviour).

 - "git commit" means: write out the current index and commit
   (the same as the current behaviour).

 - "git commit --only fileA..." means: create a temporary index
   from the current HEAD commit (or empty index if there is
   none), update it at listed paths (add/remove if necessary)
   and commit the resulting tree.  Also update the real index at
   the listed paths (add/remove if necessary).  In the original
   index file, the paths listed must be either empty or match
   exactly the HEAD commit -- otherwise we error out (Linus'
   suggestion).

 - In all cases, revert the index to the state before the
   command is run if we end up not making the commit (e.g. index
   unmerged, empty log message, pre-commit hook refusal).  With
   this, "git diff-files fileA" would show the differences as it
   showed beforean aborted "git commit -a" or "git commit fileA"
   and removes one common gripe.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 20:27                       ` Junio C Hamano
  2006-02-01 21:09                         ` Linus Torvalds
@ 2006-02-01 22:00                         ` Joel Becker
  1 sibling, 0 replies; 110+ messages in thread
From: Joel Becker @ 2006-02-01 22:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Linus Torvalds, git

On Wed, Feb 01, 2006 at 12:27:17PM -0800, Junio C Hamano wrote:
>  - "git commit fileA..." means: create a temporary index from the
>    current HEAD commit (or empty index if there is none), update
>    it at listed paths (add/remove if necessary) and commit the

	Please don't do the add/remove automatically.  I know, it's
pretty convenient if I explicitly say "git commit filetoadd", but what
happens if I say "git commit libfoo/*"?  I know that I want all my
changes in libfoo/ to be commited, ignoring my changes in libbar/.  But
I forgot that I created libfoo/testfoo.c to debug my changes, and now
it's in the repository -- and I might not even notice it for weeks.
	CVS and Subversion require an explicit "add" for this very
reason.  Even then, almost everyone gets an "import" or two wrong,
pulling in a couple built files (eg, "configure") they didn't mean to
get.
	I guess you could query the user.  "I noticed that you specified
filetoadd, and you never said 'git add'.  Do you want to add it now
[Y/n]?"

Joel


-- 

"When I am working on a problem I never think about beauty. I
 only think about how to solve the problem. But when I have finished, if
 the solution is not beautiful, I know it is wrong."
         - Buckminster Fuller

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
@ 2006-02-01 22:25                             ` Nicolas Pitre
  2006-02-01 22:50                               ` Junio C Hamano
  2006-02-01 22:35                             ` Linus Torvalds
  2006-02-01 22:57                             ` [Census] So who uses git? Daniel Barkalow
  2 siblings, 1 reply; 110+ messages in thread
From: Nicolas Pitre @ 2006-02-01 22:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> To recap:
> 
>  - "git commit fileA..." means: update index at listed paths
>    (add/remove if necessary) and then commit the tree described
>    in index (the same as the current behaviour with explicit
>    paths).

No.

>  - "git commit -a" means: update index with all local changes and
>    then commit the tree described in index (the same as the
>    current behaviour).

Sensible.

>  - "git commit" means: write out the current index and commit
>    (the same as the current behaviour).

Sensible.

>  - "git commit --only fileA..." means: create a temporary index
>    from the current HEAD commit (or empty index if there is
>    none), update it at listed paths (add/remove if necessary)
>    and commit the resulting tree.  Also update the real index at
>    the listed paths (add/remove if necessary).  In the original
>    index file, the paths listed must be either empty or match
>    exactly the HEAD commit -- otherwise we error out (Linus'
>    suggestion).

Actually, my opinion is that should be the behavior for your first item 
above (when only filenames are specified).  If you want to _also_ 
include the index like you describe in your first item then an 
additional switch should be provided.

In other words, the --only should become --with-index with the behavior 
swapped.

The fact is that when you simply specify a filename, you really expect 
_only_ that filename will be affected and the rest be left alone.  
That's the most probable expectation for any tool.  If you want 
_additional_ stuff to also be merged along with the files specified then 
it is logical to have an additional argument in that case, not the other 
way around.

Nicolas

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
@ 2006-02-01 22:35                             ` Linus Torvalds
  2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
  2006-02-01 22:57                             ` [Census] So who uses git? Daniel Barkalow
  2 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git

On Wed, 1 Feb 2006, Junio C Hamano wrote:
> 
> ... the original complaint was that "git commit" without
> explicit paths does not quack like "cvs/svn commit" -- commit
> all my changes in the working tree.

Agreed. However, I think that one is pretty easy to explain, and 
conceptually it's not a problem to just tell people to use the "-a" flag 
if they want to get CVS/SVN semantics.

After all, "git commit" will actually make it pretty obvious in the commit 
message status, _and_ if you haven't done any "git add" you'll get the 
"nothing to commit" thing, so it's not like this is hard to explain.

The real _confusion_ I think came from the filename usage.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 22:25                             ` Nicolas Pitre
@ 2006-02-01 22:50                               ` Junio C Hamano
  2006-02-02 14:59                                 ` Andreas Ericsson
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01 22:50 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git, Joel Becker

Nicolas Pitre <nico@cam.org> writes:

> Actually, my opinion is that should be the behavior for your first item 
> above (when only filenames are specified).  If you want to _also_ 
> include the index like you describe in your first item then an 
> additional switch should be provided.

OK, agreed.  Sorry to be slow.

So, to recap:

git commit paths...			(temporary index thing)
git commit --incremental paths...	(same as current w/o --incremental)
git commit               		(same as current)
git commit -a				(same as current)	

And I agree with Joel that we should not automatically imply
"git add" with or without --incremental.

I do not particularly have much preference among --also,
--with-index, or --incremental, but:

 - 'with-index' is precise but might be too technical;
 - 'incremental' is not really incremental -- you can use it
   only once.

Because you do not have to say "git commit --also" without paths
(which _is_ awkward) to get the traditional behaviour, maybe it
is a good name for that flag (it is also the shortest).

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 21:59                           ` Junio C Hamano
  2006-02-01 22:25                             ` Nicolas Pitre
  2006-02-01 22:35                             ` Linus Torvalds
@ 2006-02-01 22:57                             ` Daniel Barkalow
  2 siblings, 0 replies; 110+ messages in thread
From: Daniel Barkalow @ 2006-02-01 22:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Nicolas Pitre, git

On Wed, 1 Feb 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > If somebody doesn't know about the index, he normally will never have 
> > index changes _anyway_, except for the "git add" case. In which case "git 
> > commit" does the right thing for him: it will either commit the added 
> > files, or it will say "nothing to commit".
> 
> ... the original complaint was that "git commit" without
> explicit paths does not quack like "cvs/svn commit" -- commit
> all my changes in the working tree.

Actually, the original complaint was about "git commit path ...", I 
believe. That's the case where new users are finding that the behavior is 
surprising, rather than just unfamiliar.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Two ideas for improving git's user interface
  2006-02-01 22:35                             ` Linus Torvalds
@ 2006-02-01 23:33                               ` Carl Worth
  2006-02-02  0:38                                 ` Junio C Hamano
                                                   ` (3 more replies)
  0 siblings, 4 replies; 110+ messages in thread
From: Carl Worth @ 2006-02-01 23:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Nicolas Pitre, git

[-- Attachment #1: Type: text/plain, Size: 3612 bytes --]

On Wed, 1 Feb 2006 14:35:33 -0800 (PST), Linus Torvalds wrote:
>
> Agreed. However, I think that one is pretty easy to explain, and 
> conceptually it's not a problem to just tell people to use the "-a" flag 
> if they want to get CVS/SVN semantics.

"Just use -a" is tempting, but I don't think it's a satisfying stance
to take.

Consider the following operations:

	echo "original" > A; git add A; echo "modified" > A;
	git commit -a -m "add A";

	echo "original" > B; git add B; echo "modified" > B;
	git commit -m "add B" B;

	echo "original" > C; git add C; echo "modified" > C;
	git commit -m "add C";

After which we can see:

	$ git diff
	diff --git a/C b/C
	index 4b48dee..2e09960 100644
	--- a/C
	+++ b/C
	@@ -1 +1 @@
	-original
	+modified

To explain this, "just use -a" isn't enough, it would have to be
something like, "always use -a or else 'git commit' just won't work
and you can end up committing stale garbage". And perhaps "unless you
also add the filename to the commit line, then it will start working
again."

There's explanation for the above behavior requires a rather careful
description of the index, the operations, the flags, and some rather
subtle interactions between them.

I don't even think "embrace the index" is enough to make the above
behavior obvious---the variations in the above behavior are a bit too
subtle. 

But I don't think git is doomed to be hard to learn or that its
behavior needs to be hard to predict. I think this should be fairly
easy to fix.

Here's a fundamental question I have, (and thanks to Keith Packard for
helping me to phrase it):

	Is it ever useful (reasonable, desirable) to commit file
	contents that differ from the contents of the working
	directory?

I don't think it is, (but please let me know if I've missed some
useful case).

Idea #1 (prevent the index from being used to commit stale data)
-------
If this isn't useful, then I think git would do well to make it
harder/impossible to perform this operation. For example, the index
could have a new notion of "use working directory contents" for a
given file in addition to the current "use this blob". This would
allow a user to use the index to stage subsequent file
additions/modifications for commit without introducing the various
opportunities for confusing commits of stale data.

I would think this would then naturally resolve the confusion around
the various diff operations, (diff-index, diff-index --cached, and
diff-files).

Idea #2 (make it easy to preview diffs of what will be committed)
-------
Independent of the above, I'd like to propose another change to help
prevent confusion and to help users learn git. There should be an
obvious "diff" operation that presents exactly the result of what any
"commit" operation will perform.

I assume that there currently exist appropriate diff operations for
any commit, but the correspondence certainly isn't obvious. For
example, the simplest of commit commands:

	git commit

seems to correspond to a rather complex diff command (which I may not
have completely correct yet---and if not, that would just demonstrate
the point even more):

	git diff-index -p --cached HEAD

What I would love to have is the ability to pass the same arguments to
git diff to get a preview of what any get commit would do. For
example, something like:

	git diff		# would be a preview of:
	git commit

	git diff -a		# would be a preview of:
	git commit -a

	git diff fileA fileB	# would be a preview of:
	git commit fileA fileB

etc.

Again, thanks for your consideration to these thoughts from woefully
clueless and inexperienced user.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
@ 2006-02-02  0:38                                 ` Junio C Hamano
  2006-02-02  1:16                                   ` Carl Worth
  2006-02-02  1:23                                 ` Linus Torvalds
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-02-02  0:38 UTC (permalink / raw)
  To: Carl Worth; +Cc: git, Nicolas Pitre, Linus Torvalds

Carl Worth <cworth@cworth.org> writes:

> To explain this, "just use -a" isn't enough, it would have to be
> something like, "always use -a or else 'git commit' just won't work
> and you can end up committing stale garbage". And perhaps "unless you
> also add the filename to the commit line, then it will start working
> again."

I do not think you have to make it sound *that* negative.  I
agree it may be counterintuitive until the user groks the index.

Let's assume that we will fix things to (1) require "--also" (or
"--incremental") to get the current "git commit paths..."
behaviour, (2) without any arguments we commit the index as is,
(3) with explicit paths we commit clean HEAD plus only specified
paths using a temporary index.  I think a fairer way to say what
you said would be:

        Always use -a, or explicit paths.  With -a all of your
        changes in the working tree are committed.  With paths,
        only changes to those paths are committed.

        Once you are comfortable with making commits this way,
        you might want to learn about index file and then start
        using 'git commit' without any argument.  This works in
        a way that cannot be understood until you learn how the
        index file works, so stick to "-a or explicit paths"
        rule for now.  That rule is good enough for everyday
        use.

And you can probably go a long way without ever knowing about
index.  Initially when I wrote the above two paragraphs, I said
"appreciated" instead of "understood".  But depending on your
workflow, you may not even need what "git commit" without
arguments would give you, in which case there is nothing to
appreciate about, so I changed the wording.

Old-timer git people seem to like what it gives them but that
does not mean everybody should marvel at what it does and adopt
the workflow to take advantage of the index file.

> Here's a fundamental question I have, (and thanks to Keith Packard for
> helping me to phrase it):
>
> 	Is it ever useful (reasonable, desirable) to commit file
> 	contents that differ from the contents of the working
> 	directory?

What that means is people should always do "git commit -a".  Not
even "git commit paths...".  It matches _my_ sense of developer
discipline, especially for individual developers, but it is a
rather cumbersome straightjacket if enforced upon you in
practice.  It is a useful timesaver to be able to leave
unrelated changes around in the working tree.

> I don't think it is, (but please let me know if I've missed some
> useful case).

I think I've already done this a couple of times today.

Your "git diff" is interesting, but I'd rather make them
completely separate command from "git diff".  Perhaps "git
ndiff" and "git ncommit", that assumes there is nothing but "git
commit -a" kind of commits.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  0:38                                 ` Junio C Hamano
@ 2006-02-02  1:16                                   ` Carl Worth
  2006-02-02  2:25                                     ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Carl Worth @ 2006-02-02  1:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2754 bytes --]

On Wed, 01 Feb 2006 16:38:45 -0800, Junio C Hamano wrote:
> 
> I do not think you have to make it sound *that* negative.

Sorry about that. I was just trying to emphasize the new-user
confusion, and perhaps I went overboard.

>          It is a useful timesaver to be able to leave
> unrelated changes around in the working tree.
> 
> > I don't think it is, (but please let me know if I've missed some
> > useful case).
> 
> I think I've already done this a couple of times today.

I'm sorry. I didn't succeed in phrasing the question the way I
wanted. Yes, it is useful to be able to leave unrelated changes around
in the working tree. So in that sense, it is clearly useful to be able
to commit something that is different (in a repository-wide sense)
than what is in the working tree.

The question I was trying to ask is, for a _single file_ is it ever
useful to commit contents that differ from the contents of the working
directory? Let's call this a "skewed file" in the index.

I haven't used git much yet, but I found two cases for when one might
end up committing a skewed file:

1) Modification of working directory after git-update-index or git-add.

   There has been discussion in this thread already that the user can
   get a confusing commit in this case.

2) git-read-tree -m	# without -u

   The git documentation already advertises that not using -u here
   leads to confusion. This one looks historical, and it's not obvious
   to me whether git-read-tree is used in practice without -u.

So, in both of those cases the skewed files seem to lead only to
confusion. Are there any non-confusing cases where it's useful to be
able to commit a skewed file?

If not, we should be able to simplify things since a lot of the
UI complexity being discussed (-a vs. no -a, path names vs. no path
names), hinges on the handling of skewed files.

> Your "git diff" is interesting, but I'd rather make them
> completely separate command from "git diff".  Perhaps "git
> ndiff" and "git ncommit", that assumes there is nothing but "git
> commit -a" kind of commits.

I'd be fine with some other name than "diff" if strictly necessary,
but I'm not suggesting something that makes any assumption about "git
commit -a" only. What I want is a simple way to take any "git commit"
command and be able to examine the diff that it will be committing.

My workflow has been to always perform a final review of such a diff
while composing the commit message. I'd like to be able to do that
with git.

And I think this tool would make a very good learning tool for users
trying to figure out the various commit operations, (particularly if
we end up with different semantics for merge vs. non-merge, -a vs. no
-a, path names vs. no path names, etc.).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
  2006-02-02  0:38                                 ` Junio C Hamano
@ 2006-02-02  1:23                                 ` Linus Torvalds
  2006-02-02  1:44                                   ` Linus Torvalds
  2006-02-04  0:20                                   ` Carl Worth
  2006-02-02 12:31                                 ` Florian Weimer
  2006-02-02 16:30                                 ` Carl Baldwin
  3 siblings, 2 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-02  1:23 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, Nicolas Pitre, git

On Wed, 1 Feb 2006, Carl Worth wrote:
> 
> Here's a fundamental question I have, (and thanks to Keith Packard for
> helping me to phrase it):
> 
> 	Is it ever useful (reasonable, desirable) to commit file
> 	contents that differ from the contents of the working
> 	directory?

Yes. I do it all the time.

I tend to have a certain fairly constant set of changes in my working 
tree, namely every time a release is getting closer, I always tend to have 
the "Makefile" already updated for the new version (but not checked in: I 
do that just before I actually tag it, so that the tag will match the 
commit that actually changes the version).

I do that largely for historical reasons, namely that I've forgotten too 
many times to actually change the version number, and then I usually get a 
bug report within minutes of cutting the release with a snickering "hah, 
you forgot to change the version again".

So I do lots of commits with that Makefile being dirty, without ever 
actually committing the Makefile changes themselves. "git commit -a" as a 
default would be absolutely _horrible_ for me.

I occasionally have other things dirty too in my tree - just random 
hacking. But the Makefile is dirty about 50% of the time for me, so it's 
the common case.

And most of those commits are automated, either through pulls that are 
successful, or just my email patch-application scripts, and both of those 
cases actually check that the files that are _changed_ are never dirty in 
the working directory.

However, if the question was an even stricter "do you ever commit 
_changes_ to a particular file where the last HEAD, the index _and_ the 
working tree are all different", then the answer is actually "Yes" to that 
too.

What has happened is that I have had merges that have content conflicts 
that I fix up by hand, but exactly _because_ I fix them up by hand, I 
actually want to re-compile the kernel and test my fixups.

And in that case, I will actually re-apply my manual Makefile change, even 
if that file was part of the merge changes (in which case I had had to 
first un-apply the change in order to do the merge).

So what happens is that I recompile with my trivial changes in place 
_after_ I have fixed up any merge conflicts, reboot the thing to test, and 
then commit the result if everything looks ok.

And notice how I commit the _merge_ without actually committing my dirty 
state in the tree - and whether the files involved in my standard dirty 
changes ("Makefile") are part of the state that the merge changed or not 
is _totally_ irrelevant.

So I commit file contents that differ from my current working tree all the 
time.

ALMOST all of the time, the actual _changes_ that I commit do not actually 
touch the files that I have dirty, but as explained above, even that is 
not at all impossible.

The thing is, once you get used to the git "index" as a staging place, 
it's really really powerful. 

> Idea #2 (make it easy to preview diffs of what will be committed)
> -------
> Independent of the above, I'd like to propose another change to help
> prevent confusion and to help users learn git. There should be an
> obvious "diff" operation that presents exactly the result of what any
> "commit" operation will perform.

Actually, we do exactly that. Right now we expressly limit the "preview" 
to just the filenames, but we literally do run

	git-diff-index -M --cached --name-status --diff-filter=MDTCRA HEAD

as part of "git status", and the eventual end result is what we will 
populate the commit message file with for your editing pleasure.

And you can actually see that. 

So I would suggest that new git users never be told about the "-m" flag to 
"git commit", so that they always have to edit the commit message by hand, 
because that commit message will contain exactly this information.

Not the patch itself, though. Maybe we could make it show part of it, 
though, if somebody really wants to see it ;)

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  1:23                                 ` Linus Torvalds
@ 2006-02-02  1:44                                   ` Linus Torvalds
  2006-02-04  8:03                                     ` Alan Chandler
  2006-02-04  0:20                                   ` Carl Worth
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-02  1:44 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, Nicolas Pitre, git

On Wed, 1 Feb 2006, Linus Torvalds wrote:
> 
> And notice how I commit the _merge_ without actually committing my dirty 
> state in the tree - and whether the files involved in my standard dirty 
> changes ("Makefile") are part of the state that the merge changed or not 
> is _totally_ irrelevant.

If you get the feeling that merging is special, then to some degree, yes, 
you'd be right.

Merging (especially with conflicts) is the _one_ operation where you 
absolutely have to know about the index. If you don't know about how the 
index works, you can get the conflict resolution right kind of by 
accident, simply because the default workflow of

	.. edit conflict to look ok ..
	git commit file/with/conflict

actually happens to do exactly the right thing (very much on purpose, 
btw), but the fact is, to actually figure out more complicated conflicts 
and to _understand_ what happens, you absolutely need to be aware of the 
index. Not being aware of it just isn't an option for any serious git 
user.

(Btw, I think this is where cogito falls down. Cogito tries to hide the 
index file, but I don't think you really _can_ hide the index file and 
also do merges well at the same time. Anybody who has non-trivial merges 
should use raw git - not just because the "recursive" strategy just works 
better, but exactly because of the index file issue).

So when you work with a merge, the index file content really in a very 
real way _is_ the merge. Yes, the index file is also technically how git 
actually does all the merging complexity, but in this case, there also is 
no "diff" to the parent, and the number of changed files may be in the 
hundreds, yet "git diff" should be basically empty when you finally commit 
your merge.

I say "basically empty", because as I've explained, at least I personally 
have had dirty state in my tree at the time I commit a merge - on _top_ of 
(and independently of) the state that I actually commit.

So to recap:

 - you really do have to be aware of the index file at some point. Trying 
   to hide it entirely is a huge mistake.

 - real git power users _will_ use their awareness of the index file when 
   they commit. You will too, some day. Maybe it's only for merges, but I 
   wouldn't be surprised if somebody at some point wants to take advantage 
   of it even for "normal" working conditions (ie use "git-update-index" 
   to "freeze" a certain state for committing, and then editing the file 
   and _not_ committing those edits)

So making "-a" the default would be just a horrid horrid mistake. You can 
only hide the index so far - don't even try to hide it more.

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  1:16                                   ` Carl Worth
@ 2006-02-02  2:25                                     ` Junio C Hamano
  2006-02-03 23:57                                       ` Carl Worth
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-02-02  2:25 UTC (permalink / raw)
  To: Carl Worth; +Cc: git

Carl Worth <cworth@cworth.org> writes:

> If not, we should be able to simplify things since a lot of the
> UI complexity being discussed (-a vs. no -a, path names vs. no path
> names), hinges on the handling of skewed files.

I am in agreement with you that "skewed files" might lead to
confusion, but I do not see how that relates to "-a vs no -a" nor
"path names vs no path names" issues.

Let's say we try to detect and forbid committing skewed files.  How
would we do that?  For the sake of clarity, let's say we fixed the
commit command the way I said in the message you are responding.

Now:

1. "git commit" is the traditional one; it commits the current index.
   We enumerate paths that 'git-diff-index --cached --name-only HEAD'
   tells are different (they are the paths to be committed -- what
   about merges?  Maybe take union from all parents?).  Then we see if
   the paths from "git-diff-files --name-only" (locally modified
   files) overlap with them.  Overlapping ones will be skewed if we
   make a commit.

2. "git commit --also fileA..." updates fileA... on top of the current
   index and commits that.  After doing "git update-index fileA...",
   the story is the same as the previous case.

3. "git commit fileA..." initializes a temporary index from the
   current HEAD, updates fileA... and commits that.  We would need a
   check to make sure index matches HEAD at specified paths, but after
   that check passes, there is no skewed files being committed and
   there is nothing more to check.

4. "git commit -a" by definition would not have skewed files and there
   is nothing to check.

So what you say sounds doable.  But I wonder if that really helps
much.

Let's say we want to give an interface to a class of users who do
_not_ want to worry about the presense of the index file.  That means
they will _never_ run "git update-index" themselves, although "git
commit", "git add", and "git merge" may run update-index for them
internally.  Essentially, you tell them to always use "git commit -a"
or "git commit fileA...", and do not teach them "git commit", "git
commit --also fileA...".  IOW they will be doing only 3 or 4.  In this
case, we do not need any of the "skewed files" check.

The extra checks in 1 and 2 would prevent index-unaware users from
making obvious mistakes, but if they do not understand index then they
would still be surprised anyway.  For example, "git commit" commits
the files they previously run "git add" on, but leaves other modified
files in the working tree uncommitted.  This is different from either
3 or 4 that they have learned so far.  If they did "git commit fileA",
the file earlier they run "git add" is not committed.  If they did
"git commit -a", files other than the added files are also committed.
So in that sense the above checks are doable but I do not think it
helps that much to alleviate the confusion.

These extra checks in 1 and 2 may protect index-aware users from
making mistakes, to a certain degree.  I am not convinced enough
myself to pay the cost of extra checks, though, because my workflow is
to do the final review exactly like what you said below.

> My workflow has been to always perform a final review of such a diff
> while composing the commit message. I'd like to be able to do that
> with git.

That matches my workflow.  I do either one of these (I never use "git
commit paths..."):

	$ work work work
        $ I may do update-index [--add|--remove] here
        $ git diff --cached
        $ git commit

	$ work work work
        $ I may do update-index [--add|--remove] here
        $ git diff HEAD
        $ git commit -a

In either cases "skewed files" do not matter.  This can be summarized
in a short paragraph:

	If you are going to commit with "git commit" (no parameters),
	check the final result with "git diff --cached".  If you are
	going to commit with "git commit -a", check with "git diff
	HEAD".

I said why I do not do "git commit paths..." myself, but I think this
"skewed files" discussion adds another thing to be careful about if
you use it.  If you do this (with the current tool, you drop --also):

	$ work on file A
        $ git diff A
        ... that looks fine so far ...
        $ git update-index A
        $ work more on file A
        $ git diff A
        ... incrementally that looks fine ...
        $ git commit --also A

you would end up commiting something you have not done the "final
review".  You need to have the final check before such a commit:

	$ work on file A
        $ git diff A
        ... that looks fine so far ...
        $ git update-index A
        $ work more on file A
        $ git diff A
        ... incrementally that looks fine ...
 +++++  $ git diff HEAD
        $ git commit --also A

This includes all changes that are not in the index and are not going
to be included in the commit (i.e. changes to files other than A).
For that you may need to do something like:

	git-diff-index --cached HEAD ;# already in index but do not look at A
        git-diff-index HEAD -- A ;# and path A is taken from working tree

which is a bit cumbersome.

Without --also (the new semantics), the check would be
straightforward:

	$ work on file A
        $ git diff A
        ... that looks fine so far ...
        $ git update-index A
        $ work more on file A
        $ git diff A
        ... incrementally that looks fine ...
 +++++  $ git diff HEAD -- A
	$ git commit A

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 16:25                         ` Linus Torvalds
@ 2006-02-02  9:12                           ` Alex Riesen
  0 siblings, 0 replies; 110+ messages in thread
From: Alex Riesen @ 2006-02-02  9:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Ray Lehtiniemi, Radoslaw Szkodzinski,
	Keith Packard, Junio C Hamano, cworth, Git Mailing List

On 2/1/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > $ time git update-index --refresh
> >
> > real    0m21.500s
> > user    0m0.358s
> > sys     0m1.406s
> >
> > WinNT, NTFS, 13k files, hot cache.
>
> That's 25% less files than the Linux kernel, and I can do that operation
> in 0m0.062s (0.012s user, 0.048s system).

correction. It's 18k files, which is almost the same as 2.6.13-rc6. But these
files got *very* long names (the project poisoned by classical C++ education
and breaks windows' 255 chars limit on filename length from time to time).
Refresh index in 2.6.13 is actualy consistantly faster:

$ cd src/linux-2.6.13-rc6
$ time git update-index --refresh
real    0m1.344s
user    0m0.358s
sys     0m0.984s

> So WinNT/cygwin is about 2.5 _orders_of_maginitude_ slower here, or 340
> times slower.
>
> Now, I'm tempted to say that NT is a piece of sh*t, but the fact is, your
> CPU-times seem to indicate that most of it is IO (and the "real" cost is
> just 1.7 seconds, much of which is system time, which in turn itself is
> probably due to the IO costs too - so even that isn't comparable with
> the ).
>
> Which may mean that you simply don't have enough memory to cache the whole
> thing. Which may be NT sucking, of course ("we don't like to use more than
> 10% of memory for caches"), but it might also be a tunable (which is sucky
> in itself, of course), but finally, it might just be that you just don't
> have a ton of memory. I've got 2GB in my machines, although 1GB is plenty
> to cache the kernel.

I have 2Gb, the "System Cache" is around 1.5Gb, and this is PIV 3.2GHz.
There seem to be no tunables for any kind of system stuff
(savin' on support costs, do they?).
You'd be very hardpressed not to say that windows is a piece of sh*t.

The "benchmark: several times in a row:

$ time git update-index --refresh
real    0m1.766s
user    0m0.498s
sys     0m1.203s

$ time git update-index --refresh
real    0m1.766s
user    0m0.358s
sys     0m1.390s

$ time git update-index --refresh
real    0m1.781s
user    0m0.420s
sys     0m1.311s

$ time git update-index --refresh
real    0m1.875s
user    0m0.374s
sys     0m1.343s

$ time git update-index --refresh
real    0m1.766s
user    0m0.326s
sys     0m1.375s

It is always almost the same time. I don't think it's IO, looks more like
cache accesses. It is just that bad in this cygwin+win2k combination.
Besides, I don't trust "time <command>" on windows much: it returned
sys time 0 for git-update-index in a directory which was read before.
Yes, there was disk activity, I can hear it real good with that barrakuda.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
  2006-02-02  0:38                                 ` Junio C Hamano
  2006-02-02  1:23                                 ` Linus Torvalds
@ 2006-02-02 12:31                                 ` Florian Weimer
  2006-02-02 16:30                                 ` Carl Baldwin
  3 siblings, 0 replies; 110+ messages in thread
From: Florian Weimer @ 2006-02-02 12:31 UTC (permalink / raw)
  To: git

* Carl Worth:

> Here's a fundamental question I have, (and thanks to Keith Packard for
> helping me to phrase it):
>
> 	Is it ever useful (reasonable, desirable) to commit file
> 	contents that differ from the contents of the working
> 	directory?

You mean like "darcs record"? 8-)

I think this is very useful functionality.  Granted, it interferes
with a rigorous developer-side regression test policy ("all changes
must have been built and passed the test suite").  But it encourages
things like fixing typos in comments you spot while editing a file for
other reasons.  And you can keep some ugly debugging code while
working on a series of changes.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 22:50                               ` Junio C Hamano
@ 2006-02-02 14:59                                 ` Andreas Ericsson
  0 siblings, 0 replies; 110+ messages in thread
From: Andreas Ericsson @ 2006-02-02 14:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Linus Torvalds, git, Joel Becker

Junio C Hamano wrote:
> 
> I do not particularly have much preference among --also,
> --with-index, or --incremental, but:
> 
>  - 'with-index' is precise but might be too technical;
>  - 'incremental' is not really incremental -- you can use it
>    only once.
> 
> Because you do not have to say "git commit --also" without paths
> (which _is_ awkward) to get the traditional behaviour, maybe it
> is a good name for that flag (it is also the shortest).
> 

Except that -a, which is the logical shorthand, is already taken. How 
about --include (or --include-index, or --index) and -i? commit being a 
fairly commonly used command, I think it's safe to assume that most 
people will read the man-page or the help output if there's something 
they don't undetstand.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
                                                   ` (2 preceding siblings ...)
  2006-02-02 12:31                                 ` Florian Weimer
@ 2006-02-02 16:30                                 ` Carl Baldwin
  3 siblings, 0 replies; 110+ messages in thread
From: Carl Baldwin @ 2006-02-02 16:30 UTC (permalink / raw)
  To: Carl Worth; +Cc: Linus Torvalds, Junio C Hamano, Nicolas Pitre, git

On Wed, Feb 01, 2006 at 03:33:44PM -0800, Carl Worth wrote:
> 	Is it ever useful (reasonable, desirable) to commit file
> 	contents that differ from the contents of the working
> 	directory?

What _is_ useful about the status quo is the ability to make some minor
change, update that change to the index when I've decided that it is a
good change and then use git diff to see what I've incrementally changed
in the same file since that update.  That way new incremental changes
can be viewed independantly of the change I've already decided was good.

> What I would love to have is the ability to pass the same arguments to
> git diff to get a preview of what any get commit would do. For
> example, something like:
> 
> 	git diff		# would be a preview of:
> 	git commit

'git diff --cached' does this.

> 	git diff -a		# would be a preview of:
> 	git commit -a

'git diff HEAD' does this.

> 	git diff fileA fileB	# would be a preview of:
> 	git commit fileA fileB

Paths can be specified in conjunction with the above commands.

Yes, these are idioms specific to git and are not immediately intuitive
to the new user.  However, if the user has access to a good tutorial
that walks through these scenerios its not so bad.

Carl

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Carl Baldwin                        RADCAD (R&D CAD)
 Hewlett Packard Company
 MS 88                               work: 970 898-1523
 3404 E. Harmony Rd.                 work: Carl.N.Baldwin@hp.com
 Fort Collins, CO 80525              home: Carl@ecBaldwin.net
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  2:25                                     ` Junio C Hamano
@ 2006-02-03 23:57                                       ` Carl Worth
  0 siblings, 0 replies; 110+ messages in thread
From: Carl Worth @ 2006-02-03 23:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4527 bytes --]

[I'm still hesitant to be jumping into this discussion with both feet
like this, so please imagine lots of disclaimers of ignorance before
any claims I make---I would not be surprised or offended to learn I'm
wildly wrong about how I think some things work.]

On Wed, 01 Feb 2006 18:25:46 -0800, Junio C Hamano wrote:
> Carl Worth <cworth@cworth.org> writes:
> 
> > If not, we should be able to simplify things since a lot of the
> > UI complexity being discussed (-a vs. no -a, path names vs. no path
> > names), hinges on the handling of skewed files.
> 
> I am in agreement with you that "skewed files" might lead to
> confusion, but I do not see how that relates to "-a vs no -a" nor
> "path names vs no path names" issues.

In the case of skewed files, "-a" commits the current file content,
while "no -a" commits the skewed content. Similarly, "path names"
commits the current contents while "no path names" commits the skewed
content.

> Let's say we try to detect and forbid committing skewed files.  How
> would we do that?

I wasn't imagining adding extra checks (== more complexity). Instead I
was imagining something like a command that would mark a path to be
committed. I don't yet have a good suggestion for a short name for the
operation, but I'll call it "mark" for sake of discussion. This mark
operation would be used similarly to update-index but instead of
storing into the index an object created from the current contents of
the specified path, it would simply mark the path in the index as
to-be-committed. When committing such a path later, the object would
be created based on the contents of the path at that time.

So I imagined eliminating skewed files first by providing operations
based around "mark" rather than update-index, (since "mark" avoids all
of the confusing oops-I-committed-stale-file-contents scenarios), and
second by making all commands that update the index from the object DB
also update the working directory, (effectively making git-read-tree
always act according to its current -u).

But as a prerequisite, this kind of plan would require the user to
never actually _want_ to stash skewed contents in the index. On a
separate branch of the current thread, Linus has said he likes to do
that, so I'll continue to discuss that there, and before the outcome
of that discussion, this idea need not even be considered further.

> 1. "git commit" is the traditional one; it commits the current index.
> 
> 2. "git commit --also fileA..." updates fileA... on top of the current
> 
> 3. "git commit fileA..." initializes a temporary index from the
> 
> 4. "git commit -a" by definition would not have skewed files and there
>    is nothing to check.

The one comment I have about this proposal is a certain lack of
orthogonality. Namely the base "commit" performs one operation,
(committing the contents of the index), and "commit --also" performs
that same operation plus something more (that much is good so
far). The problem starts with "commit file" which does not perform the
base operation at all, but just does something different. Similarly,
"commit -a" is also doing something different, (its behavior can be
described as an additional step performed _before_ the base "commit"
but could also be described as an operation independent of the
original state of the index, if I'm not mistaken).

Before "-a" existed, there was better orthogonality, but apparently
there wasn't a good fit with what some users wanted to do, (hence the
addition of "-a" and the recent proposal of yet more variations on
"commit").

>         $ git diff --cached
>         $ git commit
...
>         $ git diff HEAD
>         $ git commit -a
...
> For that you may need to do something like:
> 
> 	git-diff-index --cached HEAD ;# already in index but do not look at A
>         git-diff-index HEAD -- A ;# and path A is taken from working tree
> 
> which is a bit cumbersome.
> 
> Without --also (the new semantics), the check would be
> straightforward:
..
>  +++++  $ git diff HEAD -- A
> 	$ git commit A

Thanks for the examples. If nothing else, I hope the above makes clear
that it's not always obvious how to achieve a preview diff of a
commit. I would love to see the number of fundamental variations of
"commit" shrink rather than grow, but especially if it does grow, I
think it will always be important for users to be able to easily view
"status" and "diff" previews of commits, (preferably by providing the
same arguments to some 'preview' commands as will be passed to
commit).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  1:23                                 ` Linus Torvalds
  2006-02-02  1:44                                   ` Linus Torvalds
@ 2006-02-04  0:20                                   ` Carl Worth
  2006-02-04  2:08                                     ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Carl Worth @ 2006-02-04  0:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Nicolas Pitre, git

[-- Attachment #1: Type: text/plain, Size: 4048 bytes --]

On Wed, 1 Feb 2006 17:23:38 -0800 (PST), Linus Torvalds wrote:
>
> I tend to have a certain fairly constant set of changes in my working 
> tree, namely every time a release is getting closer, I always tend to have 
> the "Makefile" already updated for the new version (but not checked in: I 
> do that just before I actually tag it, so that the tag will match the 
> commit that actually changes the version).

OK. That use case I understand just fine.

> However, if the question was an even stricter "do you ever commit 
> _changes_ to a particular file where the last HEAD, the index _and_ the 
> working tree are all different", then the answer is actually "Yes" to that 
> too.

Yes, this is the question I was trying to ask. Thanks for pretending
that I had actually asked it, and then answering it as well.

> What has happened is that I have had merges that have content conflicts 
> that I fix up by hand, but exactly _because_ I fix them up by hand, I 
> actually want to re-compile the kernel and test my fixups.

OK. I hadn't anticipated this use case, but I am interested in
exploring it more fully.

> And in that case, I will actually re-apply my manual Makefile change, even 
> if that file was part of the merge changes (in which case I had had to 
> first un-apply the change in order to do the merge).

Are the un-apply and re-apply operations here primarily manual? or
does git help you much with those (beyond alerting you that the merge
cannot take place before you un-apply things)?

> The thing is, once you get used to the git "index" as a staging place, 
> it's really really powerful.

I believe that the staging operations you perform are quite desirable,
but I wonder if existing primitives in git might not provide a more
powerful basis for the kinds of operation you're performing.

For example, in the case of the not-quite-ready-to-be-committed
changes that you want to carry along, couldn't you get additional
benefits if those changes could live on their own branch? I suppose
there may be a missing operator needed to allow you to easily merge
*and* unmerge that branch if needed. Would that seem at all feasible?

If so, could your not-ready changes be implemented as some branch that
is automatically unmerged prior to commit and then re-merged
afterwards? Or something like that?

I guess the feeling I get is that staging into the index feels
conceptually similar to a commit to a branch, but it's a uniquely weak
branch (only one revision per file). And this uniqueness also
introduces complexity (the various diff operations), as well as
possibilities of confusion when committing. Meanwhile the response to
the commit confusion seems to be to add yet more complexity to commit
which doesn't seem like an improvement to me.

[I'm maybe too far out on a limb at this point, since you've
definitely identified a use case for staging in the index, and all
I've offered as an alternative is hand-waving about "branches should
be able to do that". But if nothing else, I'm floating some ideas out
loud, and next I'll try experimenting more with possibilities for
non-index staging.]

I'm already having a lot of fun with git. It's a very impressive tool,
with a surprisingly simple/powerful core.

> Actually, we do exactly that. Right now we expressly limit the "preview" 
> to just the filenames, but we literally do run
> 
> 	git-diff-index -M --cached --name-status --diff-filter=MDTCRA HEAD
>
> as part of "git status", and the eventual end result is what we will 
> populate the commit message file with for your editing pleasure.

Yes, that's a good thing to do. In my personal workflow, a
pre-populated commit message is a bit late, since I want to review and
convince myself I like things before I type the magic word "commit".

And I'm not claiming that a preview patch is impossible to generate,
I'm just saying that it's currently rather hard to figure what the correct
correspondence for arguments to diff and arguments to commit, (see
more on this point in another branch of this thread).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-04  0:20                                   ` Carl Worth
@ 2006-02-04  2:08                                     ` Linus Torvalds
  2006-02-06 23:42                                       ` Carl Worth
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2006-02-04  2:08 UTC (permalink / raw)
  To: Carl Worth; +Cc: Junio C Hamano, Nicolas Pitre, git

On Fri, 3 Feb 2006, Carl Worth wrote:
> 
> > And in that case, I will actually re-apply my manual Makefile change, even 
> > if that file was part of the merge changes (in which case I had had to 
> > first un-apply the change in order to do the merge).
> 
> Are the un-apply and re-apply operations here primarily manual? or
> does git help you much with those (beyond alerting you that the merge
> cannot take place before you un-apply things)?

They're purely manual. If the changes are more extensive, I just create a 
temporary branch for them, which is easy enough:

	git checkout -b temp
	git commit
	git checkout master

before I do the real merge, but the fact is, most of the changes in my 
tree tend to be pretty un-interesting. Most of the time it's literally 
_just_ the Makefile change, sometimes it's a trial patch that I'm not 
ready commit and had just sent out to somebody for testing or similar.

> I believe that the staging operations you perform are quite desirable,
> but I wonder if existing primitives in git might not provide a more
> powerful basis for the kinds of operation you're performing.

No. The point is that they are trivial to do, and that they don't _need_ 
"powerful basis".

What they need is _usability_.

And the git index _is_ that usability. It is incredibly powerful, and 
incredibly easy to use.

When you argue against exposing the index, you argue against it from the 
"let's not give them rope" angle. You argue against power and flexibility. 

You argue for the clippy, the helper app that says

	Are you sure you want to do this?
		[Yes] [No] [Cancel]

while I'm trying to explain that it's actually part of the _power_ of git.

The fact, that I can keep dirty state in my tree and continue to work with 
it _without_ having to worry about it is a huge relief to me. 

> If so, could your not-ready changes be implemented as some branch that
> is automatically unmerged prior to commit and then re-merged
> afterwards? Or something like that?

Sure. They could. You could make things more complicated, and they would 
WORK. 

They'd be inconvenient and not offer any actual improvement.

The "index" file in git really is very important.

Staging into the index is _the_ most fundamental operation. You can't 
actually see it very well in the history of git (because the first commit 
exists only after git actually worked pretty fully), but the birth of git 
is really in the index file. That actually came _before_ the object store, 
as the way to quickly and efficiently track the notion of "changes".

So git itself started out very much with the index file being the staging 
area for tracking the state of a working tree efficiently.

No git operation actually ever lets the working tree interact directly 
with the object store. The notion of "diff this <tree> object against the 
current working tree" comes closest, but even that actually really goes 
through the index file: it's properly a "diff this <tree> object against 
the index file, and check at the same time the index entry against the 
working tree"

If you deny the index file, you really deny git itself.

Think of it this way: when you start a new process, in UNIX you do that in 
two stages: first you fork() to create a copy, then you do exec() to 
populate the copy with the new process. 

Your argument is akin to saying "That's horribly wasteful: wouldn't it be 
much more intuitive to just do 'spawn()' to do it all, and avoid the 
unnecessary middle step".

But that "unnecessary" middle step - whether it's "fork()" or the git 
"index" file - is actually the source of the flexibility. It's what allows 
you to do the "fixups" in the middle when you switch file descriptors 
around, or when you fix up merge conflicts.

And then occasionally, you do fork() _without_ doing an execve() at all. 
The same way that sometimes you do operations on the index without 
actually committing them to a tree.

That's flexibility. Revel in it, instead of trying to push it under the 
rug. 

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-02  1:44                                   ` Linus Torvalds
@ 2006-02-04  8:03                                     ` Alan Chandler
  2006-02-04  8:25                                       ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Alan Chandler @ 2006-02-04  8:03 UTC (permalink / raw)
  To: git

On Thursday 02 February 2006 01:44, Linus Torvalds wrote:
> On Wed, 1 Feb 2006, Linus Torvalds wrote:
> > And notice how I commit the _merge_ without actually committing my dirty
> > state in the tree - and whether the files involved in my standard dirty
> > changes ("Makefile") are part of the state that the merge changed or not
> > is _totally_ irrelevant.
>
> If you get the feeling that merging is special, then to some degree, yes,
> you'd be right.
>
> Merging (especially with conflicts) is the _one_ operation where you
> absolutely have to know about the index. If you don't know about how the
> index works, you can get the conflict resolution right kind of by
> accident, simply because the default workflow of
>
> 	.. edit conflict to look ok ..
> 	git commit file/with/conflict
>
> actually happens to do exactly the right thing (very much on purpose,
> btw), but the fact is, to actually figure out more complicated conflicts
> and to _understand_ what happens, you absolutely need to be aware of the
> index. Not being aware of it just isn't an option for any serious git
> user.
>
> (Btw, I think this is where cogito falls down. Cogito tries to hide the
> index file, but I don't think you really _can_ hide the index file and
> also do merges well at the same time. Anybody who has non-trivial merges
> should use raw git - not just because the "recursive" strategy just works
> better, but exactly because of the index file issue).

Wow - light comes on.

I have been using git (or rather to be exact git with cg-add, cg-rm and 
cg-commit) for about 6 months (bearing in mind I am only a part time 
programmer in the evenings for fun - even though I work in the computer 
industry the last time I was paid to write code was in 1979 - so I don't 
really need to be a power user).  Although I knew about the index file since 
the beginning I never really groked what it was about before.

Of course I knew of its existance, and I even knew that it could be used as a 
staging area, but up to now I had always thought of it as a necessary 
inconvenience to enable git to run as blazingly fast as it does - not as an 
essential part of work flow it complex situations

I think the problem is with three crucial bits of documentation. Firstly, the 
document is full of the git doesn't do prorcelain statements - pushing towards 
cogito which then hides the existance of the index file.  Git not doing 
porcelain was true at the very beginning, but I don't think that it is true 
any longer.

Secondly the tutorial.  The examples given start by using commands to 
explicitly update the index and them they move on to show how you don't need 
to do that by using the more advanced commands of git-add and git-commit.  So 
as I was trying to learn how to use git, I followed through this and thought 
that you just try an avoid using it directly.  Whats more, viewed in this 
light git-commit seemed to be a rather poor implementation of cogito's 
superior cg-commit command

[Incidentally there is a use case that doesn't seem to have been discussed in 
this thread which I use cg-commit all the time for and will now have to see 
if there is a use index file equivalence for.  That is, I am developing a web 
application and in the running version the database framework (iBatis) is 
using Tomcats connection pooling.  In order to run my JUnit test harness, I 
don't have tomcat, so I need to define a different version of iBatis 
configuration file to used its own database connection.  So I have created a 
test branch and edited the configuration file in that branch, and I update 
both code and tests in a edit/compile/fix and text loop until I have written 
or changed both code and tests.  I then do a cg-commit which lists the files 
I have changed.  I ONLY commit those in the test harness - by deleting the 
others from cogito's list of files to commit - and then repeat the commit 
commiting the rest].  I then switch back to my master branch and cherry pick 
commit that is the code changes - not the text harness] 

Thirdly,  "discussion" of the index file at the bottom end of the git man page 
(The "index" aka "Current Directory Cache") really concentrates on what it is 
and what operations you can perform with it in the normal situation.

I tried looking at the core tutorial looking at what I might be a way of bring 
this to the attention of the new learner into git and produced the following 
(partial) patch to the core-tutorial (It needs a whole set of examples on 
resolving merge problems which I have no idea at the moment how to do - this 
has been the real area which never understood - basically because the 
tutorial itself says skip that part).

--- a/Documentation/core-tutorial.txt
+++ b/Documentation/core-tutorial.txt
@@ -212,15 +212,22 @@ was just to show that `git-update-index`
 actually saved away the contents of your files into the git object
 database.

+The Index File
+--------------
+
 Updating the index did something else too: it created a `.git/index`
 file. This is the index that describes your current working tree, and
-something you should be very aware of. Again, you normally never worry
-about the index file itself, but you should be aware of the fact that
-you have not actually really "checked in" your files into git so far,
-you've only *told* git about them.
+something you should be very aware of.  It is a staging area between your
+working tree and the object store described above.
+
+In normal circumstances you do not worry about the index file itself, but you
+should be aware of the fact that you have not actually really "checked in"
+your files into git so far, you've only *told* git about them.  Later you
+will see how you can exploit the fact that there is this separate index
+file to undertake more complex operations.

-However, since git knows about them, you can now start using some of the
-most basic git commands to manipulate the files or look at their status.
+However, since git knows about these files, you can now start using some of
+the most basic git commands to manipulate them or look at their status.

 In particular, let's not even check in the two files into git yet, we'll
 start off by adding another line to `hello` first:
@@ -1188,8 +1195,8 @@ How does the merge work?
 We said this tutorial shows what plumbing does to help you cope
 with the porcelain that isn't flushing, but we so far did not
 talk about how the merge really works.  If you are following
-this tutorial the first time, I'd suggest to skip to "Publishing
-your work" section and come back here later.
+this tutorial the first time, I'd suggest to skip to "Resolving Merge
+Problems" section and come back here later.

 OK, still with me?  To give us an example to look at, let's go
 back to the earlier repository with "hello" and "example" file,
@@ -1332,6 +1339,10 @@ merge for you to resolve.  Notice that t
 unmerged, and what you see with `git diff` at this point is
 differences since stage 2 (i.e. your version).

+Resolving Merge Problems
+------------------------
+
+NOT SURE WHAT GOES HERE

 Publishing your work
 --------------------

-- 
Alan Chandler
http://www.chandlerfamily.org.uk
Open Source. It's the difference between trust and antitrust.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-04  8:03                                     ` Alan Chandler
@ 2006-02-04  8:25                                       ` Junio C Hamano
  2006-02-04  9:30                                         ` Alan Chandler
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2006-02-04  8:25 UTC (permalink / raw)
  To: Alan Chandler; +Cc: git

Alan Chandler <alan@chandlerfamily.org.uk> writes:

> Wow - light comes on.

That's good.

> -this tutorial the first time, I'd suggest to skip to "Publishing
> -your work" section and come back here later.
> +this tutorial the first time, I'd suggest to skip to "Resolving Merge
> +Problems" section and come back here later.

The changes before this look very good to me, but these two
lines do not make any sense. If you are going to talk about
"Resolving Merge Problems", you _need_ to know about index, so
you cannot skip the material.

I think having a section on manual merge resolution between the
Index File section and Publishing section makes sense.  What
kind of merges did you have trouble figuring out when you were
still git novice?  That would be a good starting point.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-04  8:25                                       ` Junio C Hamano
@ 2006-02-04  9:30                                         ` Alan Chandler
  0 siblings, 0 replies; 110+ messages in thread
From: Alan Chandler @ 2006-02-04  9:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On Saturday 04 February 2006 08:25, Junio C Hamano wrote:
> Alan Chandler <alan@chandlerfamily.org.uk> writes:
> > Wow - light comes on.
>
> That's good.
>
> > -this tutorial the first time, I'd suggest to skip to "Publishing
> > -your work" section and come back here later.
> > +this tutorial the first time, I'd suggest to skip to "Resolving Merge
> > +Problems" section and come back here later.
>
> The changes before this look very good to me, but these two
> lines do not make any sense. If you are going to talk about
> "Resolving Merge Problems", you _need_ to know about index, so
> you cannot skip the material.

Maybe - since the light has just come on, I need to understand a lot more 
about this area before I can really comment.  The tutorial invited one to 
skip it before, so I was just doing so again.

Even today when I tried to read this section again my eyes glazed over.  The 
long sha outputs from git-ls-files screams off the page "don't bother this is 
detailed technical stuff" :-(




>
> I think having a section on manual merge resolution between the
> Index File section and Publishing section makes sense.  What
> kind of merges did you have trouble figuring out when you were
> still git novice?  That would be a good starting point.
>

I STILL come out in a cold sweat (actually that is a bit over the top:-) ) as 
soon as a merge fails for whatever reason.  The problem is that I am not 
doing development full time, nor in a team, so I probably hit one about once 
every 2 months.  This means that I don't remember what to do, and need to go 
and look it up.  But where - there is nothing in my main reference places 
(Everyday Git - or before that the tutorial). 

So I normally attempt to do what I think is sensible.  Manually searching for 
files that haven't merged. Edit the lines with the
 >>>>>>
====
<<< 
markers in them until I think the resultant file is what it should be and then 
try commit again (probably cg-commit rather than git commit).  But what 
happens next is then hit or miss - sometimes it just works - sometimes it 
doesn't and I am that place where there was a long thread a couple of months 
ago entitled something like "and what do I do now?"

I must admit it normally works OK - but I have come across situations a couple 
of weeks later where a file is in an unexpected state - seems to have been 
from the wrong branch, or missing a commit I thought I had made.
-- 
Alan Chandler
http://www.chandlerfamily.org.uk
Open Source. It's the difference between trust and antitrust.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  2:04                     ` Linus Torvalds
                                         ` (3 preceding siblings ...)
  2006-02-01 19:20                       ` Julian Phillips
@ 2006-02-06 21:15                       ` Chuck Lever
  4 siblings, 0 replies; 110+ messages in thread
From: Chuck Lever @ 2006-02-06 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ray Lehtiniemi, Alex Riesen, Radoslaw Szkodzinski, Keith Packard,
	Junio C Hamano, cworth, Martin Langhoff, Git Mailing List,
	Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

Linus Torvalds wrote:
>>for comparison, one of our sandboxes is sitting on an NTFS file system,
>>accessed via SMB:
>>
>>  smbfs$ time git update-index --refresh
>>  real    11m36.502s
>>  user    0m6.830s
>>  sys     0m5.086s
> 
> 
> Ouch, ouch, ouch.
> 
> Sounds like every single stat() will go out the wire. I forget what the 
> Linux NFS client does, but I _think_ it has a metadata timeout that avoids 
> this. But it might be as bad under NFS.
> 
> Has anybody used git over NFS? If it's this bad (or even close to), I 
> guess the "mark files as up-to-date in the index" approach is a really 
> good idea..
> 
> Of course, the whole point of git is that you should keep your repository 
> close, but sometimes NFS - or similar - is enforced upon you by other 
> issues, like the fact that the powers-that-be want anonymous workstations 
> and everybody should work with a home-directory automounted over NFS..

yes, i keep my Linux kernel repository in NFS (and my stgit and git 
repositories too).

there are some things that are slow precisely because my think time is 
longer than the NFS client's attribute timeout, which means that all of 
git's lstat()s turn into GETATTRs.

using the "noatime,nodiratime,actimeo=7200" mount options can have some 
benefit.  however, i found that keeping the repository packed provides 
the greatest positive impact.  that means that most of the objects are 
in a single file, and can be validated with just one GETATTR.

one thing we might conclude from this is that making "packing" an 
efficient operation (or even an incremental one) would go a long way to 
helping performance on network file systems.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 451 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Open Source NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://troy.citi.umich.edu/u/cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: Two ideas for improving git's user interface
  2006-02-04  2:08                                     ` Linus Torvalds
@ 2006-02-06 23:42                                       ` Carl Worth
  0 siblings, 0 replies; 110+ messages in thread
From: Carl Worth @ 2006-02-06 23:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Nicolas Pitre, git

[-- Attachment #1: Type: text/plain, Size: 1097 bytes --]

On Fri, 3 Feb 2006 18:08:19 -0800 (PST), Linus Torvalds wrote:
> 
> If you deny the index file, you really deny git itself.
> 

[And the novice nearly reaches enlightenment.]

OK. I now thoroughly understand that this use of the index is by
design, not accident. So I won't propose modifying the index again.

But, I'm still not yet sure how to reconcile my personal workflow with
git yet. I don't think I'm yet an index-embracer that would relish
using "update-index <file>; commit" for everything. At the same time,
I can appreciate the disdain for "commit -a" to the extent that it
does deny the index.

I suspect that what I want might fall somewhere in between, with
something like "mark <file>; commit-marked" where commit-marked would
update the index of all marked files, then commit. This would allow me
to still use update-index for the times that I actually need to take
advantage of what that provides.

So, maybe I'll try scripting up something for that myself, (at the
risk of re-inventing cogito). Or maybe I'll just learn to live with
(and love?) what git provides already.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH] "Assume unchanged" git
  2006-02-01  2:09                       ` Linus Torvalds
@ 2006-02-09  5:15                         ` Junio C Hamano
  2006-02-09  5:49                           ` [PATCH] "Assume unchanged" git: do not set CE_VALID with --refresh Junio C Hamano
  2006-02-09  5:50                           ` [PATCH] ls-files: debugging aid for CE_VALID changes Junio C Hamano
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-09  5:15 UTC (permalink / raw)
  To: git
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, cworth,
	Martin Langhoff, Linus Torvalds

Linus Torvalds <torvalds@osdl.org> writes:

> The real meat is just making sure that CE_VALID gets set/cleared properly.

Setting is easier part.  Deciding when to ignore/clear for the
sake of safety and usability is harder.  I think I got the
basics right but we might want to pass "really" from more places.

This is _not_ 1.2 material, but I think it is ready to be tested
by people who asked for this feature.  It applies on top of the
recent master branch.

-- >8 --
[PATCH] "Assume unchanged" git

This adds "assume unchanged" logic, started by this message in the list
discussion recently:

	<Pine.LNX.4.64.0601311807470.7301@g5.osdl.org>

This is a workaround for filesystems that do not have lstat()
that is quick enough for the index mechanism to take advantage
of.  On the paths marked as "assumed to be unchanged", the user
needs to explicitly use update-index to register the object name
to be in the next commit.

You can use two new options to update-index to set and reset the
CE_VALID bit:

	git-update-index --assume-unchanged path...
	git-update-index --no-assume-unchanged path...

These forms manipulate only the CE_VALID bit; it does not change
the object name recorded in the index file.  Nor they add a new
entry to the index.

When the configuration variable "core.ignorestat = true" is set,
the index entries are marked with CE_VALID bit automatically
after:

 - update-index to explicitly register the current object name to the
   index file.

 - when update-index --refresh finds the path to be up-to-date.

 - when tools like read-tree -u and apply --index update the working
   tree file and register the current object name to the index file.

The flag is dropped upon read-tree that does not check out the index
entry.  This happens regardless of the core.ignorestat settings.

Index entries marked with CE_VALID bit are assumed to be
unchanged most of the time.  However, there are cases that
CE_VALID bit is ignored for the sake of safety and usability:

 - while "git-read-tree -m" or git-apply need to make sure
   that the paths involved in the merge do not have local
   modifications.  This sacrifices performance for safety.

 - when git-checkout-index -f -q -u -a tries to see if it needs
   to checkout the paths.  Otherwise you can never check
   anything out ;-).

 - when git-update-index --really-refresh (a new flag) tries to
   see if the index entry is up to date.  You can start with
   everything marked as CE_VALID and run this once to drop
   CE_VALID bit for paths that are modified.

Most notably, "update-index --refresh" honours CE_VALID and does
not actively stat, so after you modified a file in the working
tree, update-index --refresh would not notice until you tell the
index about it with "git-update-index path" or "git-update-index
--no-assume-unchanged path".

This version is not expected to be perfect.  I think diff
between index and/or tree and working files may need some
adjustment, and there probably needs other cases we should
automatically unmark paths that are marked to be CE_VALID.

But the basics seem to work, and ready to be tested by people
who asked for this feature.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 apply.c          |    2 +-
 cache.h          |    6 +++--
 checkout-index.c |    1 +
 config.c         |    5 ++++
 diff-files.c     |    2 +-
 diff-index.c     |    2 +-
 diff.c           |    2 +-
 entry.c          |    2 +-
 environment.c    |    1 +
 read-cache.c     |   28 +++++++++++++++++++----
 read-tree.c      |    2 +-
 update-index.c   |   65 ++++++++++++++++++++++++++++++++++++++++++++++++------
 write-tree.c     |    2 +-
 13 files changed, 99 insertions(+), 21 deletions(-)

b169290f100cfa67b785c361bcae83f807487f5e
diff --git a/apply.c b/apply.c
index 2ad47fb..35ae48e 100644
--- a/apply.c
+++ b/apply.c
@@ -1309,7 +1309,7 @@ static int check_patch(struct patch *pat
 					return -1;
 			}
 
-			changed = ce_match_stat(active_cache[pos], &st);
+			changed = ce_match_stat(active_cache[pos], &st, 1);
 			if (changed)
 				return error("%s: does not match index",
 					     old_name);
diff --git a/cache.h b/cache.h
index bdbe2d6..cd58fad 100644
--- a/cache.h
+++ b/cache.h
@@ -91,6 +91,7 @@ struct cache_entry {
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_UPDATE    (0x4000)
+#define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
 #define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
@@ -144,8 +145,8 @@ extern int add_cache_entry(struct cache_
 extern int remove_cache_entry_at(int pos);
 extern int remove_file_from_cache(const char *path);
 extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
-extern int ce_match_stat(struct cache_entry *ce, struct stat *st);
-extern int ce_modified(struct cache_entry *ce, struct stat *st);
+extern int ce_match_stat(struct cache_entry *ce, struct stat *st, int);
+extern int ce_modified(struct cache_entry *ce, struct stat *st, int);
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, const char *type);
 extern int index_pipe(unsigned char *sha1, int fd, const char *type, int write_object);
@@ -161,6 +162,7 @@ extern int commit_index_file(struct cach
 extern void rollback_index_file(struct cache_file *);
 
 extern int trust_executable_bit;
+extern int assume_unchanged;
 extern int only_use_symrefs;
 extern int diff_rename_limit_default;
 extern int shared_repository;
diff --git a/checkout-index.c b/checkout-index.c
index 53dd8cb..957b4a8 100644
--- a/checkout-index.c
+++ b/checkout-index.c
@@ -116,6 +116,7 @@ int main(int argc, char **argv)
 	int all = 0;
 
 	prefix = setup_git_directory();
+	git_config(git_default_config);
 	prefix_length = prefix ? strlen(prefix) : 0;
 
 	if (read_cache() < 0) {
diff --git a/config.c b/config.c
index 8355224..7dbdce1 100644
--- a/config.c
+++ b/config.c
@@ -222,6 +222,11 @@ int git_default_config(const char *var, 
 		return 0;
 	}
 
+	if (!strcmp(var, "core.ignorestat")) {
+		assume_unchanged = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.symrefsonly")) {
 		only_use_symrefs = git_config_bool(var, value);
 		return 0;
diff --git a/diff-files.c b/diff-files.c
index d24d11c..c96ad35 100644
--- a/diff-files.c
+++ b/diff-files.c
@@ -191,7 +191,7 @@ int main(int argc, const char **argv)
 			show_file('-', ce);
 			continue;
 		}
-		changed = ce_match_stat(ce, &st);
+		changed = ce_match_stat(ce, &st, 0);
 		if (!changed && !diff_options.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
diff --git a/diff-index.c b/diff-index.c
index f8a102e..12a9418 100644
--- a/diff-index.c
+++ b/diff-index.c
@@ -33,7 +33,7 @@ static int get_stat_data(struct cache_en
 			}
 			return -1;
 		}
-		changed = ce_match_stat(ce, &st);
+		changed = ce_match_stat(ce, &st, 0);
 		if (changed) {
 			mode = create_ce_mode(st.st_mode);
 			if (!trust_executable_bit &&
diff --git a/diff.c b/diff.c
index ec51e7d..c72064e 100644
--- a/diff.c
+++ b/diff.c
@@ -311,7 +311,7 @@ static int work_tree_matches(const char 
 	ce = active_cache[pos];
 	if ((lstat(name, &st) < 0) ||
 	    !S_ISREG(st.st_mode) || /* careful! */
-	    ce_match_stat(ce, &st) ||
+	    ce_match_stat(ce, &st, 0) ||
 	    memcmp(sha1, ce->sha1, 20))
 		return 0;
 	/* we return 1 only when we can stat, it is a regular file,
diff --git a/entry.c b/entry.c
index 6c47c3a..8fb99bc 100644
--- a/entry.c
+++ b/entry.c
@@ -123,7 +123,7 @@ int checkout_entry(struct cache_entry *c
 	strcpy(path + len, ce->name);
 
 	if (!lstat(path, &st)) {
-		unsigned changed = ce_match_stat(ce, &st);
+		unsigned changed = ce_match_stat(ce, &st, 1);
 		if (!changed)
 			return 0;
 		if (!state->force) {
diff --git a/environment.c b/environment.c
index 0596fc6..251e53c 100644
--- a/environment.c
+++ b/environment.c
@@ -12,6 +12,7 @@
 char git_default_email[MAX_GITNAME];
 char git_default_name[MAX_GITNAME];
 int trust_executable_bit = 1;
+int assume_unchanged = 0;
 int only_use_symrefs = 0;
 int repository_format_version = 0;
 char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8";
diff --git a/read-cache.c b/read-cache.c
index c5474d4..efbb1be 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -27,6 +27,9 @@ void fill_stat_cache_info(struct cache_e
 	ce->ce_uid = htonl(st->st_uid);
 	ce->ce_gid = htonl(st->st_gid);
 	ce->ce_size = htonl(st->st_size);
+
+	if (assume_unchanged)
+		ce->ce_flags |= htons(CE_VALID);
 }
 
 static int ce_compare_data(struct cache_entry *ce, struct stat *st)
@@ -146,9 +149,18 @@ static int ce_match_stat_basic(struct ca
 	return changed;
 }
 
-int ce_match_stat(struct cache_entry *ce, struct stat *st)
+int ce_match_stat(struct cache_entry *ce, struct stat *st, int ignore_valid)
 {
-	unsigned int changed = ce_match_stat_basic(ce, st);
+	unsigned int changed;
+
+	/*
+	 * If it's marked as always valid in the index, it's
+	 * valid whatever the checked-out copy says.
+	 */
+	if (!ignore_valid && (ce->ce_flags & htons(CE_VALID)))
+		return 0;
+
+	changed = ce_match_stat_basic(ce, st);
 
 	/*
 	 * Within 1 second of this sequence:
@@ -164,7 +176,7 @@ int ce_match_stat(struct cache_entry *ce
 	 * effectively mean we can make at most one commit per second,
 	 * which is not acceptable.  Instead, we check cache entries
 	 * whose mtime are the same as the index file timestamp more
-	 * careful than others.
+	 * carefully than others.
 	 */
 	if (!changed &&
 	    index_file_timestamp &&
@@ -174,10 +186,10 @@ int ce_match_stat(struct cache_entry *ce
 	return changed;
 }
 
-int ce_modified(struct cache_entry *ce, struct stat *st)
+int ce_modified(struct cache_entry *ce, struct stat *st, int really)
 {
 	int changed, changed_fs;
-	changed = ce_match_stat(ce, st);
+	changed = ce_match_stat(ce, st, really);
 	if (!changed)
 		return 0;
 	/*
@@ -233,6 +245,11 @@ int cache_name_compare(const char *name1
 		return -1;
 	if (len1 > len2)
 		return 1;
+
+	/* Differences between "assume up-to-date" should not matter. */
+	flags1 &= ~CE_VALID;
+	flags2 &= ~CE_VALID;
+
 	if (flags1 < flags2)
 		return -1;
 	if (flags1 > flags2)
@@ -430,6 +447,7 @@ int add_cache_entry(struct cache_entry *
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
+
 	pos = cache_name_pos(ce->name, ntohs(ce->ce_flags));
 
 	/* existing match? Just replace it. */
diff --git a/read-tree.c b/read-tree.c
index 5580f15..52f06e3 100644
--- a/read-tree.c
+++ b/read-tree.c
@@ -349,7 +349,7 @@ static void verify_uptodate(struct cache
 		return;
 
 	if (!lstat(ce->name, &st)) {
-		unsigned changed = ce_match_stat(ce, &st);
+		unsigned changed = ce_match_stat(ce, &st, 1);
 		if (!changed)
 			return;
 		errno = 0;
diff --git a/update-index.c b/update-index.c
index afec98d..767fd49 100644
--- a/update-index.c
+++ b/update-index.c
@@ -23,6 +23,10 @@ static int quiet; /* --refresh needing u
 static int info_only;
 static int force_remove;
 static int verbose;
+static int mark_valid_only = 0;
+#define MARK_VALID 1
+#define UNMARK_VALID 2
+
 
 /* Three functions to allow overloaded pointer return; see linux/err.h */
 static inline void *ERR_PTR(long error)
@@ -53,6 +57,25 @@ static void report(const char *fmt, ...)
 	va_end(vp);
 }
 
+static int mark_valid(const char *path)
+{
+	int namelen = strlen(path);
+	int pos = cache_name_pos(path, namelen);
+	if (0 <= pos) {
+		switch (mark_valid_only) {
+		case MARK_VALID:
+			active_cache[pos]->ce_flags |= htons(CE_VALID);
+			break;
+		case UNMARK_VALID:
+			active_cache[pos]->ce_flags &= ~htons(CE_VALID);
+			break;
+		}
+		active_cache_changed = 1;
+		return 0;
+	}
+	return -1;
+}
+
 static int add_file_to_cache(const char *path)
 {
 	int size, namelen, option, status;
@@ -94,6 +117,7 @@ static int add_file_to_cache(const char 
 	ce = xmalloc(size);
 	memset(ce, 0, size);
 	memcpy(ce->name, path, namelen);
+	ce->ce_flags = htons(namelen);
 	fill_stat_cache_info(ce, &st);
 
 	ce->ce_mode = create_ce_mode(st.st_mode);
@@ -105,7 +129,6 @@ static int add_file_to_cache(const char 
 		if (0 <= pos)
 			ce->ce_mode = active_cache[pos]->ce_mode;
 	}
-	ce->ce_flags = htons(namelen);
 
 	if (index_path(ce->sha1, path, &st, !info_only))
 		return -1;
@@ -128,7 +151,7 @@ static int add_file_to_cache(const char 
  * For example, you'd want to do this after doing a "git-read-tree",
  * to link up the stat cache details with the proper files.
  */
-static struct cache_entry *refresh_entry(struct cache_entry *ce)
+static struct cache_entry *refresh_entry(struct cache_entry *ce, int really)
 {
 	struct stat st;
 	struct cache_entry *updated;
@@ -137,21 +160,22 @@ static struct cache_entry *refresh_entry
 	if (lstat(ce->name, &st) < 0)
 		return ERR_PTR(-errno);
 
-	changed = ce_match_stat(ce, &st);
+	changed = ce_match_stat(ce, &st, really);
 	if (!changed)
 		return NULL;
 
-	if (ce_modified(ce, &st))
+	if (ce_modified(ce, &st, really))
 		return ERR_PTR(-EINVAL);
 
 	size = ce_size(ce);
 	updated = xmalloc(size);
 	memcpy(updated, ce, size);
 	fill_stat_cache_info(updated, &st);
+
 	return updated;
 }
 
-static int refresh_cache(void)
+static int refresh_cache(int really)
 {
 	int i;
 	int has_errors = 0;
@@ -171,12 +195,19 @@ static int refresh_cache(void)
 			continue;
 		}
 
-		new = refresh_entry(ce);
+		new = refresh_entry(ce, really);
 		if (!new)
 			continue;
 		if (IS_ERR(new)) {
 			if (not_new && PTR_ERR(new) == -ENOENT)
 				continue;
+			if (really && PTR_ERR(new) == -EINVAL) {
+				/* If we are doing --really-refresh that
+				 * means the index is not valid anymore.
+				 */
+				ce->ce_flags &= ~htons(CE_VALID);
+				active_cache_changed = 1;
+			}
 			if (quiet)
 				continue;
 			printf("%s: needs update\n", ce->name);
@@ -274,6 +305,8 @@ static int add_cacheinfo(unsigned int mo
 	memcpy(ce->name, path, len);
 	ce->ce_flags = create_ce_flags(len, stage);
 	ce->ce_mode = create_ce_mode(mode);
+	if (assume_unchanged)
+		ce->ce_flags |= htons(CE_VALID);
 	option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
 	option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
 	if (add_cache_entry(ce, option))
@@ -317,6 +350,12 @@ static void update_one(const char *path,
 		fprintf(stderr, "Ignoring path %s\n", path);
 		return;
 	}
+	if (mark_valid_only) {
+		if (mark_valid(p))
+			die("Unable to mark file %s", path);
+		return;
+	}
+
 	if (force_remove) {
 		if (remove_file_from_cache(p))
 			die("git-update-index: unable to remove %s", path);
@@ -467,7 +506,11 @@ int main(int argc, const char **argv)
 				continue;
 			}
 			if (!strcmp(path, "--refresh")) {
-				has_errors |= refresh_cache();
+				has_errors |= refresh_cache(0);
+				continue;
+			}
+			if (!strcmp(path, "--really-refresh")) {
+				has_errors |= refresh_cache(1);
 				continue;
 			}
 			if (!strcmp(path, "--cacheinfo")) {
@@ -493,6 +536,14 @@ int main(int argc, const char **argv)
 					die("git-update-index: %s cannot chmod %s", path, argv[i]);
 				continue;
 			}
+			if (!strcmp(path, "--assume-unchanged")) {
+				mark_valid_only = MARK_VALID;
+				continue;
+			}
+			if (!strcmp(path, "--no-assume-unchanged")) {
+				mark_valid_only = UNMARK_VALID;
+				continue;
+			}
 			if (!strcmp(path, "--info-only")) {
 				info_only = 1;
 				continue;
diff --git a/write-tree.c b/write-tree.c
index f866059..addb5de 100644
--- a/write-tree.c
+++ b/write-tree.c
@@ -111,7 +111,7 @@ int main(int argc, char **argv)
 	funny = 0;
 	for (i = 0; i < entries; i++) {
 		struct cache_entry *ce = active_cache[i];
-		if (ntohs(ce->ce_flags) & ~CE_NAMEMASK) {
+		if (ce_stage(ce)) {
 			if (10 < ++funny) {
 				fprintf(stderr, "...\n");
 				break;
-- 
1.1.6.gbb042

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH] "Assume unchanged" git: do not set CE_VALID with --refresh
  2006-02-09  5:15                         ` [PATCH] "Assume unchanged" git Junio C Hamano
@ 2006-02-09  5:49                           ` Junio C Hamano
  2006-02-09  5:50                           ` [PATCH] ls-files: debugging aid for CE_VALID changes Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-09  5:49 UTC (permalink / raw)
  To: git
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, cworth,
	Martin Langhoff, Linus Torvalds

When working with automatic assume-unchanged mode using
core.ignorestat, setting CE_VALID after --refresh makes things
more cumbersome to use.  Consider this scenario:

 (1) the working tree is on a filesystem with slow lstat(2).
     The user sets core.ignorestat = true.

 (2) "git checkout" to switch to a different branch (or initial
     checkout) updates all paths and the index starts out with
     "all clean".

 (3) The user knows she wants to edit certain paths.  She uses
     update-index --no-assume-unchanged (we could call it --edit;
     the name is inmaterial) to mark these paths and starts
     editing.

 (4) After editing half of the paths marked to be edited, she
     runs "git status".  This runs "update-index --refresh" to
     reduce the false hits from diff-files.

 (5) Now the other half of the paths, since she has not changed
     them, are found to match the index, and CE_VALID is set on
     them again.

For this reason, this commit makes update-index --refresh not to
set CE_VALID even after the path without CE_VALID are verified
to be up to date.  The user still can run --really-refresh to
force lstat() to match the index entries to the reality.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 update-index.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

fd4e57f17733d85ed5346d70005ea900cb80b9ff
diff --git a/update-index.c b/update-index.c
index 767fd49..bb73050 100644
--- a/update-index.c
+++ b/update-index.c
@@ -172,6 +172,15 @@ static struct cache_entry *refresh_entry
 	memcpy(updated, ce, size);
 	fill_stat_cache_info(updated, &st);
 
+	/* In this case, if really is not set, we should leave
+	 * CE_VALID bit alone.  Otherwise, paths marked with
+	 * --no-assume-unchanged (i.e. things to be edited) will
+	 * reacquire CE_VALID bit automatically, which is not
+	 * really what we want.
+	 */
+	if (!really && assume_unchanged && !(ce->ce_flags & htons(CE_VALID)))
+		updated->ce_flags &= ~htons(CE_VALID);
+
 	return updated;
 }
 
-- 
1.1.6.gbb042

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH] ls-files: debugging aid for CE_VALID changes.
  2006-02-09  5:15                         ` [PATCH] "Assume unchanged" git Junio C Hamano
  2006-02-09  5:49                           ` [PATCH] "Assume unchanged" git: do not set CE_VALID with --refresh Junio C Hamano
@ 2006-02-09  5:50                           ` Junio C Hamano
  1 sibling, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-09  5:50 UTC (permalink / raw)
  To: git
  Cc: Alex Riesen, Radoslaw Szkodzinski, Keith Packard, cworth,
	Martin Langhoff, Linus Torvalds

This is not really part of the proposed updates for CE_VALID,
but with this change, ls-files -t shows CE_VALID paths with
lowercase tag letters instead of the usual uppercase.  Useful
for checking out what is going on.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 ls-files.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

775ca05ee2ba7e1f54ec4db1fed7069014364f2c
diff --git a/ls-files.c b/ls-files.c
index 6af3b09..3f06ece 100644
--- a/ls-files.c
+++ b/ls-files.c
@@ -447,6 +447,22 @@ static void show_ce_entry(const char *ta
 	if (pathspec && !match(pathspec, ce->name, len))
 		return;
 
+	if (tag && *tag && (ce->ce_flags & htons(CE_VALID))) {
+		static char alttag[4];
+		memcpy(alttag, tag, 3);
+		if (isalpha(tag[0]))
+			alttag[0] = tolower(tag[0]);
+		else if (tag[0] == '?')
+			alttag[0] = '!';
+		else {
+			alttag[0] = 'v';
+			alttag[1] = tag[0];
+			alttag[2] = ' ';
+			alttag[3] = 0;
+		}
+		tag = alttag;
+	}
+
 	if (!show_stage) {
 		fputs(tag, stdout);
 		write_name_quoted("", 0, ce->name + offset,
@@ -503,7 +519,7 @@ static void show_files(void)
 			err = lstat(ce->name, &st);
 			if (show_deleted && err)
 				show_ce_entry(tag_removed, ce);
-			if (show_modified && ce_modified(ce, &st))
+			if (show_modified && ce_modified(ce, &st, 0))
 				show_ce_entry(tag_modified, ce);
 		}
 	}
-- 
1.1.6.gbb042

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01 16:10 ` Alex Riesen
@ 2006-02-01 21:27   ` linux
  0 siblings, 0 replies; 110+ messages in thread
From: linux @ 2006-02-01 21:27 UTC (permalink / raw)
  To: linux, raa.lkml; +Cc: git, torvalds

> Inodes are either uselessor dangerous  in cygwin (hash of an
> absolute pathname on FAT). They may not even change after rm+touch.

Yes, I just looked it up and found that out.  I was hoping they used
first block number like many Linux FSes have tried, in which case it
would have worked, but if it's a hash of the path name, it's
guaranteed not to change.

And Linus' point is excellent, too: this feature is also useful
for automated systems (like git-applypatch) that can be assumed to
never forget to warn git ahead of time.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 linux
  2006-02-01  8:51 ` Junio C Hamano
  2006-02-01 16:04 ` Linus Torvalds
@ 2006-02-01 16:10 ` Alex Riesen
  2006-02-01 21:27   ` linux
  2 siblings, 1 reply; 110+ messages in thread
From: Alex Riesen @ 2006-02-01 16:10 UTC (permalink / raw)
  To: linux; +Cc: torvalds, git

On 1 Feb 2006 02:08:47 -0500, linux@horizon.com <linux@horizon.com> wrote:
> > Yes, I think the "assume unchanged" flag goes well together with making
> > sure that the checked-out file is non-writable at the time.
> >
> > Of course, any number of editors and other actions won't care: if you do
> > anything like
> >
> >       for i in *.c
> >       do
> >               sed 's/xyzzy/bas/g' < $i > $i.new
> >               mv $i.new $i
> >       done
> >
> > you'll never have even noticed that the old file was marked read-only. So
> > it's obviously not in any way any guarantee, but it probably makes sense
> > as a crutch.
>
> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.

Inodes are either uselessor dangerous  in cygwin (hash of an
absolute pathname on FAT). They may not even change after rm+touch.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 linux
  2006-02-01  8:51 ` Junio C Hamano
@ 2006-02-01 16:04 ` Linus Torvalds
  2006-02-01 16:10 ` Alex Riesen
  2 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2006-02-01 16:04 UTC (permalink / raw)
  To: linux; +Cc: git

On Tue, 1 Feb 2006, linux@horizon.com wrote:
> 
> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.
> 
> Would it be worth validating the inode numbers (which can be retrieved
> in a batch) even if you don't do a full lstat()?

I don't think it's worth it. It's the unusual case anyway, and it doesn't 
even really guarantee anything either (the person _could_ just have marked 
the inode writable - not understanding what is going on, he could have 
just done a "chmod +w" behind git's back).

Together with the fact that it might not work everywhere, and that I could 
well imagine that "readdir()" is slow on cygwin too (how does it do 
"d_ino"? Maybe it has to do a stat() to emulate unix behaviour?), I'm not 
convinced it's worth it.

I think the whole "assume it's valid" is a crutch - but if we do it, we 
should make it _really_ fast, because it's also useful for automated 
procedures that _know_ which files they touch. So we should make it have 
minimal impact.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
  2006-02-01  7:08 linux
@ 2006-02-01  8:51 ` Junio C Hamano
  2006-02-01 16:04 ` Linus Torvalds
  2006-02-01 16:10 ` Alex Riesen
  2 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2006-02-01  8:51 UTC (permalink / raw)
  To: linux; +Cc: git

linux@horizon.com writes:

> At the risk of complicating something already very complicated, and
> possibly breaking on Microsoft file systems, that case can be detected
> by reading the directory and noticing that the inode number changed.
>
> Would it be worth validating the inode numbers (which can be retrieved
> in a batch) even if you don't do a full lstat()?

I suspect that what you said about Microsoft filesystems is even
stronger. IIRC the latest Cygwin stopped giving d_ino regardless
of the filesystem type; you need to do a stat() anyway.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [Census] So who uses git?
@ 2006-02-01  7:08 linux
  2006-02-01  8:51 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: linux @ 2006-02-01  7:08 UTC (permalink / raw)
  To: torvalds; +Cc: git, linux

> Yes, I think the "assume unchanged" flag goes well together with making 
> sure that the checked-out file is non-writable at the time.
> 
> Of course, any number of editors and other actions won't care: if you do 
> anything like
> 
> 	for i in *.c
> 	do
> 		sed 's/xyzzy/bas/g' < $i > $i.new
> 		mv $i.new $i
> 	done
> 
> you'll never have even noticed that the old file was marked read-only. So 
> it's obviously not in any way any guarantee, but it probably makes sense 
> as a crutch.

At the risk of complicating something already very complicated, and
possibly breaking on Microsoft file systems, that case can be detected
by reading the directory and noticing that the inode number changed.

Would it be worth validating the inode numbers (which can be retrieved
in a batch) even if you don't do a full lstat()?

Or is that too Unix-centric and prone to performance problems on other
file systems?  I'd think that, even if a file system used fake inode
numbers, they'd be pretty consistent if you didn't touch the file at all,
and being different would just cause a more expensive validation.
Which would be okay as long as it's infrequent.

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2006-02-09  5:50 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-26  2:10 LCA06 Cogito/GIT workshop - (Re: git-whatchanged: exit out early on errors) Martin Langhoff
2006-01-28  4:47 ` Linus Torvalds
2006-01-28  5:33   ` Martin Langhoff
2006-01-28  5:53     ` Linus Torvalds
2006-01-28  6:32       ` Junio C Hamano
2006-01-29 10:12       ` Fredrik Kuivinen
2006-01-29 20:15         ` Junio C Hamano
2006-01-28 11:00     ` Keith Packard
2006-01-28 21:08       ` [Census] So who uses git? Junio C Hamano
2006-01-29  2:14         ` Morten Welinder
2006-01-29  3:53           ` Junio C Hamano
2006-01-29 14:19             ` Morten Welinder
2006-01-29 20:15               ` Junio C Hamano
2006-01-29 10:09         ` Keith Packard
2006-01-29 11:18           ` Radoslaw Szkodzinski
2006-01-29 18:12             ` Greg KH
2006-01-31 18:33               ` Radoslaw Szkodzinski
2006-01-31 19:50                 ` Radoslaw Szkodzinski
2006-01-31 20:43                   ` Junio C Hamano
2006-01-31 21:02                     ` Radoslaw Szkodzinski
2006-01-30 22:51             ` Alex Riesen
2006-01-31 21:25               ` Linus Torvalds
2006-01-31 21:52                 ` J. Bruce Fields
2006-01-31 22:01                 ` Alex Riesen
     [not found]                   ` <20060201013901.GA16832@mail.com>
2006-02-01  2:04                     ` Linus Torvalds
2006-02-01  2:09                       ` Linus Torvalds
2006-02-09  5:15                         ` [PATCH] "Assume unchanged" git Junio C Hamano
2006-02-09  5:49                           ` [PATCH] "Assume unchanged" git: do not set CE_VALID with --refresh Junio C Hamano
2006-02-09  5:50                           ` [PATCH] ls-files: debugging aid for CE_VALID changes Junio C Hamano
2006-02-01  2:31                       ` [Census] So who uses git? Junio C Hamano
2006-02-01  3:43                         ` Linus Torvalds
2006-02-01  7:03                           ` Junio C Hamano
     [not found]                         ` <20060201045337.GC25753@mail.com>
2006-02-01  5:04                           ` Linus Torvalds
2006-02-01  5:42                           ` Junio C Hamano
2006-02-01 16:15                       ` Jason Riedy
2006-02-01 19:20                       ` Julian Phillips
2006-02-01 19:29                         ` Linus Torvalds
2006-02-06 21:15                       ` Chuck Lever
2006-02-01  2:52                     ` Martin Langhoff
2006-02-01  3:48                       ` Linus Torvalds
2006-02-01 19:30                         ` H. Peter Anvin
2006-02-01 14:55                       ` Alex Riesen
2006-02-01 16:25                         ` Linus Torvalds
2006-02-02  9:12                           ` Alex Riesen
2006-01-29 18:37         ` Dave Jones
2006-01-29 20:17           ` Daniel Barkalow
2006-01-29 20:29             ` Martin Langhoff
2006-01-30 15:23             ` Mike McCormack
2006-01-30 18:58         ` Carl Baldwin
2006-01-31 10:27           ` Johannes Schindelin
2006-01-31 15:24             ` Carl Baldwin
2006-01-31 15:31               ` Johannes Schindelin
2006-01-31 17:30             ` Linus Torvalds
2006-01-31 18:12               ` J. Bruce Fields
2006-01-31 19:33                 ` Junio C Hamano
2006-01-31 19:44                   ` Jon Loeliger
2006-01-31 19:52                     ` Junio C Hamano
     [not found]                     ` <7vd5i8w2nc.fsf@assigned-by-dhcp.cox.net>
2006-01-31 20:56                       ` J. Bruce Fields
2006-01-31 20:06                   ` J. Bruce Fields
2006-01-31 19:01               ` Keith Packard
2006-01-31 19:21                 ` Linus Torvalds
2006-01-31 22:55                   ` Joel Becker
2006-02-01 14:43                     ` Johannes Schindelin
2006-01-31 20:56                 ` Sam Ravnborg
2006-01-31 22:21                   ` Junio C Hamano
2006-02-01 19:34               ` H. Peter Anvin
2006-01-31 23:16             ` Daniel Barkalow
2006-01-31 23:36               ` Petr Baudis
2006-01-31 23:47               ` Junio C Hamano
2006-02-01  0:38                 ` Linus Torvalds
2006-02-01  0:52                   ` Junio C Hamano
2006-02-01  2:19                   ` Daniel Barkalow
2006-02-01  6:42                   ` Junio C Hamano
2006-02-01  7:22                     ` Carl Worth
2006-02-01  8:26                       ` Junio C Hamano
2006-02-01  9:59                         ` Randal L. Schwartz
2006-02-01 20:48                           ` Junio C Hamano
2006-02-01 17:11                     ` Linus Torvalds
2006-02-01 17:18                     ` Nicolas Pitre
2006-02-01 20:27                       ` Junio C Hamano
2006-02-01 21:09                         ` Linus Torvalds
2006-02-01 21:34                           ` Nicolas Pitre
2006-02-01 21:59                           ` Junio C Hamano
2006-02-01 22:25                             ` Nicolas Pitre
2006-02-01 22:50                               ` Junio C Hamano
2006-02-02 14:59                                 ` Andreas Ericsson
2006-02-01 22:35                             ` Linus Torvalds
2006-02-01 23:33                               ` Two ideas for improving git's user interface Carl Worth
2006-02-02  0:38                                 ` Junio C Hamano
2006-02-02  1:16                                   ` Carl Worth
2006-02-02  2:25                                     ` Junio C Hamano
2006-02-03 23:57                                       ` Carl Worth
2006-02-02  1:23                                 ` Linus Torvalds
2006-02-02  1:44                                   ` Linus Torvalds
2006-02-04  8:03                                     ` Alan Chandler
2006-02-04  8:25                                       ` Junio C Hamano
2006-02-04  9:30                                         ` Alan Chandler
2006-02-04  0:20                                   ` Carl Worth
2006-02-04  2:08                                     ` Linus Torvalds
2006-02-06 23:42                                       ` Carl Worth
2006-02-02 12:31                                 ` Florian Weimer
2006-02-02 16:30                                 ` Carl Baldwin
2006-02-01 22:57                             ` [Census] So who uses git? Daniel Barkalow
2006-02-01 22:00                         ` Joel Becker
2006-02-01 19:32           ` H. Peter Anvin
2006-02-01  7:08 linux
2006-02-01  8:51 ` Junio C Hamano
2006-02-01 16:04 ` Linus Torvalds
2006-02-01 16:10 ` Alex Riesen
2006-02-01 21:27   ` linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.