All of lore.kernel.org
 help / color / mirror / Atom feed
* Hey - A Conceptual Simplication....
@ 2009-11-18 12:55 George Dennie
  2009-11-18 13:18 ` Jonathan del Strother
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: George Dennie @ 2009-11-18 12:55 UTC (permalink / raw)
  To: git; +Cc: torvalds

A Clean checkout command might be...

The Git model does not seem to go far enough conceptually, for some
unexplainable reason...

In particular, why is Git not treating the entire working tree as the
versioned document (qualified of course by the .gitignore file). 

Instead, Git is treating a manually maintained list of files within the
working tree as the versioned document, this list being initialized and
manually amended by the "Git add/rm/mv" commands, etc. 

The result is conceptual complexity and rather counter-intuitive behavior.
For example, adding and renaming files outside of Git is not considered
editing the version until you subsequently do a "Git Add ." Contrast that
with editing or deleting files outside of Git. Yet adding and renaming files
and folders is a significant part of substantive projects, especially in the
early stages and experimental branches.

Granted, this is not a big deal functionally, but what is being lost is
conceptual simplicity (and consistency, in my book) and conceptual
simplicity is a key value point, if not THE key.

Also can we augment checkout to totally CLEAN the working directory prior to
a restore. If necessary we can augment .gitignore to stipulate those files
or folders that should be excluded from the cleaning. This suggestion is in
recognition of the fact that if you  are not versioning the file, it is
typically trash; which becomes the case when the entire working treat is
treated as the versioned document.

Consequently, I recommend the following new commands:
	"Git commit -x"   -- performs a "Git add ." then a "Git commit"
	"Git checkout -x" -- that clean the working tree prior to perform a
checkout

P.S.
Great your work.

George Dennie, BMath
The Point Of Sale People
www.pospeople.com
BUS: 416-496-2921
FAX: 416-496-9496

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
@ 2009-11-18 13:18 ` Jonathan del Strother
  2009-11-18 13:25 ` Jan Krüger
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Jonathan del Strother @ 2009-11-18 13:18 UTC (permalink / raw)
  To: George Dennie; +Cc: git, torvalds

2009/11/18 George Dennie <gdennie@pospeople.com>:
> A Clean checkout command might be...
>
> The Git model does not seem to go far enough conceptually, for some
> unexplainable reason...
>
> In particular, why is Git not treating the entire working tree as the
> versioned document (qualified of course by the .gitignore file).
>
> Instead, Git is treating a manually maintained list of files within the
> working tree as the versioned document, this list being initialized and
> manually amended by the "Git add/rm/mv" commands, etc.
>
> The result is conceptual complexity and rather counter-intuitive behavior.
> For example, adding and renaming files outside of Git is not considered
> editing the version until you subsequently do a "Git Add ." Contrast that
> with editing or deleting files outside of Git. Yet adding and renaming files
> and folders is a significant part of substantive projects, especially in the
> early stages and experimental branches.
>
> Granted, this is not a big deal functionally, but what is being lost is
> conceptual simplicity (and consistency, in my book) and conceptual
> simplicity is a key value point, if not THE key.
>
> Also can we augment checkout to totally CLEAN the working directory prior to
> a restore. If necessary we can augment .gitignore to stipulate those files
> or folders that should be excluded from the cleaning. This suggestion is in
> recognition of the fact that if you  are not versioning the file, it is
> typically trash; which becomes the case when the entire working treat is
> treated as the versioned document.
>
> Consequently, I recommend the following new commands:
>        "Git commit -x"   -- performs a "Git add ." then a "Git commit"
>        "Git checkout -x" -- that clean the working tree prior to perform a
> checkout
>


Perhaps try 'git commit -a' and 'git checkout -f' ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
  2009-11-18 13:18 ` Jonathan del Strother
@ 2009-11-18 13:25 ` Jan Krüger
  2009-11-18 18:51   ` George Dennie
  2009-11-18 13:30 ` Thomas Rast
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Jan Krüger @ 2009-11-18 13:25 UTC (permalink / raw)
  To: George Dennie; +Cc: git, torvalds

Hi,

> The result is conceptual complexity and rather counter-intuitive
> behavior. For example, adding and renaming files outside of Git is
> not considered editing the version until you subsequently do a "Git
> Add ." Contrast that with editing or deleting files outside of Git.
> Yet adding and renaming files and folders is a significant part of
> substantive projects, especially in the early stages and experimental
> branches.

yet even now, people routinely add huge amounts of files they didn't
actually want to add, and then have to expend a huge amount of effort
to get them out of the history again (particularly if that history has
already been published).

What you are describing is a workflow that is even fuller of potential
for wrong turns than the current standard workflow is. If simplicity
leads to a greater potential for errors, how is it a good thing?

This kind of workflow actually involves more work for the user. She now
has to meticulously maintain an accurate list of ignore patterns,
particularly because of this:

> Also can we augment checkout to totally CLEAN the working directory
> prior to a restore. If necessary we can augment .gitignore to
> stipulate those files or folders that should be excluded from the
> cleaning.

So if I forget to add a certain pattern, my file is lost forever? Uhh...

> This suggestion is in recognition of the fact that if you
> are not versioning the file, it is typically trash

Just how typical is that, though? I wouldn't want to be the one to
judge that.

In light of my concerns, I oppose adding your suggestions to the
official CLI of git and I suggest that you create your own commands to
enable this kind of workflow. For example:

git config --global alias.commitx '!git add . && git commit'
git config --global alias.checkoutx '!git clean && git checkout'

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
  2009-11-18 13:18 ` Jonathan del Strother
  2009-11-18 13:25 ` Jan Krüger
@ 2009-11-18 13:30 ` Thomas Rast
  2009-11-18 13:31 ` Jason Sewall
  2009-11-18 20:36 ` Linus Torvalds
  4 siblings, 0 replies; 25+ messages in thread
From: Thomas Rast @ 2009-11-18 13:30 UTC (permalink / raw)
  To: George Dennie; +Cc: git, torvalds

George Dennie wrote:
>
> Instead, Git is treating a manually maintained list of files within the
> working tree as the versioned document, this list being initialized and
> manually amended by the "Git add/rm/mv" commands, etc. 

This feature is called the "index", and is not merely a list of the
files, but also their content.  Please read

  http://tomayko.com/writings/the-thing-about-git

for a nice explanation why this is a good and useful thing.

> 	"Git commit -x"   -- performs a "Git add ." then a "Git commit"
> 	"Git checkout -x" -- that clean the working tree prior to perform a checkout

That would require supernaturally good maintenance of your .gitignore
to avoid adding or (worse) nuking files by accident.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
                   ` (2 preceding siblings ...)
  2009-11-18 13:30 ` Thomas Rast
@ 2009-11-18 13:31 ` Jason Sewall
  2009-11-18 20:36 ` Linus Torvalds
  4 siblings, 0 replies; 25+ messages in thread
From: Jason Sewall @ 2009-11-18 13:31 UTC (permalink / raw)
  To: George Dennie; +Cc: git, torvalds

On Wed, Nov 18, 2009 at 7:55 AM, George Dennie <gdennie@pospeople.com> wrote:
>
> In particular, why is Git not treating the entire working tree as the
> versioned document (qualified of course by the .gitignore file).
>
> Instead, Git is treating a manually maintained list of files within the
> working tree as the versioned document, this list being initialized and
> manually amended by the "Git add/rm/mv" commands, etc.

Isn't fastidiously maintaining a .gitignore file to contain everything
you *don't* want in the project more confusing than explicitly
specifying things you *do* want in the project?

> The result is conceptual complexity and rather counter-intuitive behavior.
> For example, adding and renaming files outside of Git is not considered
> editing the version until you subsequently do a "Git Add ." Contrast that
> with editing or deleting files outside of Git. Yet adding and renaming files
> and folders is a significant part of substantive projects, especially in the
> early stages and experimental branches.
>
> Granted, this is not a big deal functionally, but what is being lost is
> conceptual simplicity (and consistency, in my book) and conceptual
> simplicity is a key value point, if not THE key.

In fact, it's a big deal in functionality, but the utility is in being
able to to specify exactly what I want to be part of each commit. One
of git's great features is the ability to specify *exactly* what you
want to be part of each commit, down to the line. This means that each
commit can be extremely fine grained and represent specific bug fixes
and or features.

If you have a bunch of debugging code sitting around in your working
tree after you've tracked down a problem, you don't want to commit all
of those printfs, etc. - you want to commit the fix. This has
ramifications from making diffs of history cleaner to making git
bisect actually useful.

> Also can we augment checkout to totally CLEAN the working directory prior to
> a restore. If necessary we can augment .gitignore to stipulate those files
> or folders that should be excluded from the cleaning. This suggestion is in
> recognition of the fact that if you  are not versioning the file, it is
> typically trash; which becomes the case when the entire working treat is
> treated as the versioned document.

This is even worse. It's already pretty easy to trash your working
directory by reflexively typing git checkout -f, and you want to

> Consequently, I recommend the following new commands:
>        "Git commit -x"   -- performs a "Git add ." then a "Git commit"
>        "Git checkout -x" -- that clean the working tree prior to perform a
> checkout

I see that Jan has replied with some loaded guns, *ahem* aliases. Go
ahead and use them, but I recommend you look at the diffs in git.git
or some other repository that takes advantage of making commits as
compact as possible, and learn how to use git add -p.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Hey - A Conceptual Simplication....
  2009-11-18 13:25 ` Jan Krüger
@ 2009-11-18 18:51   ` George Dennie
  2009-11-18 19:40     ` Jakub Narebski
  2009-11-20  1:35     ` Dmitry Potapov
  0 siblings, 2 replies; 25+ messages in thread
From: George Dennie @ 2009-11-18 18:51 UTC (permalink / raw)
  To: 'Jan Krüger'; +Cc: git

Thanks Jan, Jason, Jonathan, and Thomas for your response, your thoughts and
concerns are enlightening....

Jan Kruger wrote...
> git config --global alias.commitx '!git add . && git commit'
> git config --global alias.checkoutx '!git clean && git checkout'

Thank you. Being new to git, I did not know that such aliasing was available
within it.

Jason Sewell wrote...
> If you have a bunch of debugging code sitting around in your working tree
after you've tracked down a 
> problem, you don't want to commit all of those printfs, etc. - you want to
commit the fix. This has 
> ramifications from making diffs of history cleaner to making git bisect
actually useful.

One of the concerns I have with the manual pick-n-commit is that you can
forget a file or two. Consequently, unless you do a clean checkout and test
of the commit, you don't know that your publishable version even compiles.
It seems safer to commit the entirety of your work in its working state and
then do a clean checkout from a dedicated publishable branch and manually
merge the changes in that, test, and commit.

It seems the intuitive model is to treat version control as applying to the
whole document, not parts of it. In this respect the document is defined by
the IDE, namely the entire solution, warts and all. When you start
selectively saving parts of the document then you are doing two things,
versioning and publishing; and at the same time. This was a critical flaw in
older version control approaches because the software solution document is a
file system sub-tree.

What you termed the debugging/printf's I would treat as a distinctions
between a debug vs. a release version that may be suitably delineated by
#define's or preferably separate unit tests assemblies. If I must prune
prior to committing; however, then it seems reverting spurious printf's may
offer a more reliable and automatable technique than ensuring that I have
added all the new class files, resource files, text files, sub projects,
etc; that may constitute the "fix." Once so selectively reverted I can test
and commit such a publishable version.

Jason Sewell wrote...
>  Isn't fastidiously maintaining a .gitignore file to contain everything
you *don't* want in the project more confusing 
> than explicitly specifying things you *do* want in the project?

This is git ignore for "cleaning prior to a check" and git ignore for
"adding to index" and is not an either or. You would specify what you don't
want to version tracked as normal but you can also stipulate what you don't
want to be deleted during a clean restore (which should otherwise completely
wipe the folder prior to restoring a specific commit). This would permit
embedding non-version elements within the version tree for whatever reason
you find necessary.

Thomas Rast wrote...
> That would require supernaturally good maintenance of your .gitignore to
avoid adding or (worse) nuking files by accident.

On the contrary, the approach would all but eliminate the possibility of
loss of data since you would not manually (and therefore error prone-ingly)
pruning until after a commit. In fact, one might default automatic commits
(if required) prior to checkouts or at least an alert system when
uncommitted changes exists.

Thanks again for your input.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 18:51   ` George Dennie
@ 2009-11-18 19:40     ` Jakub Narebski
  2009-11-18 19:52       ` Jason Sewall
  2009-11-20  1:35     ` Dmitry Potapov
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2009-11-18 19:40 UTC (permalink / raw)
  To: George Dennie; +Cc: 'Jan Krüger', git

"George Dennie" <gdennie@pospeople.com> writes:

> Thanks Jan, Jason, Jonathan, and Thomas for your response, your thoughts and
> concerns are enlightening....
 
> Jason Sewell wrote...
>
> > If you have a bunch of debugging code sitting around in your working tree
> > after you've tracked down a problem, you don't want to commit all
> > of those printfs, etc. - you want to commit the fix. This has
> > ramifications from making diffs of history cleaner to making git
> > bisect actually useful.
> 
> One of the concerns I have with the manual pick-n-commit is that you can
> forget a file or two.

I don't think that this concern is valid.  

The files which make project are those defined in Makefile or
equivalent project file, _not_ all files (or even all files of
specific type / extension) that do happen to reside in given
directory.  And those files whould be known to git, either added when
importing project into git, or added when they were created.  And if
they are known it is enough to use "git commit -a" to pick all
changes.

So I don't see how you can 'forget a file or two'.

Are those *theoretical* concerns, or is it something that happened to
you doring using git?

> Consequently, unless you do a clean checkout and test
> of the commit, you don't know that your publishable version even compiles.
> It seems safer to commit the entirety of your work in its working state and
> then do a clean checkout from a dedicated publishable branch and manually
> merge the changes in that, test, and commit.

That's what

  git stash --keep-index

is for.  

That, and continuous integration repository, with it's hooks.

> 
> It seems the intuitive model is to treat version control as applying to the
> whole document, not parts of it. In this respect the document is defined by
> the IDE, namely the entire solution, warts and all.

Yes, and IDE has project file which defines which files are in
project, just like version control system has it's tracked files.

> When you start
> selectively saving parts of the document then you are doing two things,
> versioning and publishing; and at the same time. This was a critical flaw in
> older version control approaches because the software solution document is a
> file system sub-tree.

Atomic commits are important, but the distinction between tracked
files, (untracked) ignored files, and files in "limbo" state (neither
tracked nor ignored) is orthogonal to having atomic commits.

> Jason Sewell wrote...
>
> >  Isn't fastidiously maintaining a .gitignore file to contain
> > everything you *don't* want in the project more confusing than
> > explicitly specifying things you *do* want in the project?  
> 
> This is git ignore for "cleaning prior to a check" and git ignore for
> "adding to index" and is not an either or. You would specify what you don't
> want to version tracked as normal but you can also stipulate what you don't
> want to be deleted during a clean restore (which should otherwise completely
> wipe the folder prior to restoring a specific commit). This would permit
> embedding non-version elements within the version tree for whatever reason
> you find necessary.

And this is supposedly easier to use?  I don't think so.

> Thomas Rast wrote...
>
> > That would require supernaturally good maintenance of your
> > .gitignore to avoid adding or (worse) nuking files by accident.
> 
> On the contrary, the approach would all but eliminate the possibility of
> loss of data since you would not manually (and therefore error prone-ingly)
> pruning until after a commit. In fact, one might default automatic commits
> (if required) prior to checkouts or at least an alert system when
> uncommitted changes exists.

What?  I cannot understand you here.

I think that automatic pruning of non-versioned files is _more_ error
prone than manual deleting of files.  And much more error prone that
just keeping non-ignored and non-tracked files.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 19:40     ` Jakub Narebski
@ 2009-11-18 19:52       ` Jason Sewall
  2009-11-19  2:03         ` George Dennie
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Sewall @ 2009-11-18 19:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: George Dennie, Jan Krüger, git

Sorry for the 2x post, George; forgot to include the list in my reply....

On Wed, Nov 18, 2009 at 1:51 PM, George Dennie <gdennie@pospeople.com> wrote:
[some cleanup of quote line wrapping]
> Jason Sewall wrote...
>> If you have a bunch of debugging code sitting around in your
>> working tree after you've tracked down a problem, you don't want to
>> commit all of those printfs, etc. - you want to commit the
>> fix. This has ramifications from making diffs of history cleaner to
>> making git bisect actually useful.

> One of the concerns I have with the manual pick-n-commit is that you
> can forget a file or two. Consequently, unless you do a clean
> checkout and test of the commit, you don't know that your
> publishable version even compiles.  It seems safer to commit the
> entirety of your work in its working state and then do a clean
> checkout from a dedicated publishable branch and manually merge the
> changes in that, test, and commit.

I find git status very useful in preparing a commit; untracked (and
'un-ignored') files are listed right there and I can if there are new
source files that are not present but not tracked.  You could even add
a 'pre-commit hook' to make sure that you don't have any untracked *.c
(or whatever) files before you actually make the commit.

As to 'publishable' version, it's probably a good idea to run 'make
distcheck' or the equivalent before making a release anyway.

> It seems the intuitive model is to treat version control as applying
> to the whole document, not parts of it. In this respect the document
> is defined by the IDE, namely the entire solution, warts and
> all. When you start selectively saving parts of the document then
> you are doing two things, versioning and publishing; and at the same
> time. This was a critical flaw in older version control approaches
> because the software solution document is a file system sub-tree.

I find this leads to big, shapeless commits and, as I mentioned
before, it seriously limits the utility of 'git bisect'.  I also fail
to see how 'selectively saving parts of the document' is versioning
and publishing - what is the publishing part?  The act of committing
is one thing (and 'saving parts of the document' is one conceivable
name for it) and publishing another.  Your workflow may vary, but
before actually 'publishing' (perhaps pushing out to a public repo, or
merging into a public branch), it's probably a good idea to test the
code with whatever system you use anyway.

> What you termed the debugging/printf's I would treat as a
> distinctions between a debug vs. a release version that may be
> suitably delineated by #define's or preferably separate unit tests
> assemblies. If I must prune prior to committing; however, then it
> seems reverting spurious printf's may offer a more reliable and
> automatable technique than ensuring that I have added all the new
> class files, resource files, text files, sub projects, etc; that may
> constitute the "fix." Once so selectively reverted I can test and
> commit such a publishable version.

What if you are hacking away and make changes to several parts of the
code at once?  Making the commits as fine-grained as possible makes it
easier to cherry-pick, bisect, and understand the history.

As to debugging code, I admit I sometimes will use git gui or git add
-p to stage just what I want and then put whatever is 'left over' in a
branch that I might use again later if another bug comes up.  Then I
can reset --hard my 'working' branch and the debugging code is gone.

> Jason Sewell wrote...
>>  Isn't fastidiously maintaining a .gitignore file to contain
>> everything you *don't* want in the project more confusing than
>> explicitly specifying things you *do* want in the project?
>
> This is git ignore for "cleaning prior to a check" and git ignore
> for "adding to index" and is not an either or. You would specify
> what you don't want to version tracked as normal but you can also
> stipulate what you don't want to be deleted during a clean restore
> (which should otherwise completely wipe the folder prior to
> restoring a specific commit). This would permit embedding
> non-version elements within the version tree for whatever reason you
> find necessary.

Perhaps I don't understand your scheme, but it sounds like you're
advocating 2 .gitignores:

* .gitignore_track; with everything you don't automatically staged but
 which can be trashed by your cleaning checkout
* .gitignore_keep; with things you don't want staged but which
  shouldn't be deleted by git during cleaning

That seems even more confusing.  I'm actually having trouble seeing
why you want this untracked-file nuking checkout at all.  Care to give
an example?

> Thomas Rast wrote...
>> That would require supernaturally good maintenance of your
>> .gitignore to
> avoid adding or (worse) nuking files by accident.
>
> On the contrary, the approach would all but eliminate the
> possibility of loss of data since you would not manually (and
> therefore error prone-ingly) pruning until after a commit. In fact,
> one might default automatic commits (if required) prior to checkouts
> or at least an alert system when uncommitted changes exists.

Who is pruning after a commit?  Once nice thing about checkout is that
it will refuse to move to a different commit if there are files that
will get trashed.  Then you can say 'oops, I should stash/commit/nuke
that stuff before I change HEAD.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
                   ` (3 preceding siblings ...)
  2009-11-18 13:31 ` Jason Sewall
@ 2009-11-18 20:36 ` Linus Torvalds
  4 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2009-11-18 20:36 UTC (permalink / raw)
  To: George Dennie; +Cc: git



On Wed, 18 Nov 2009, George Dennie wrote:
> 
> The Git model does not seem to go far enough conceptually, for some
> unexplainable reason...

Others already mentioned this, but the concept you missed is the git 
'index', which is actually very central (it is actually the first part of 
git written, before even the object database) but is something that most 
people who get started with git can (and do) ignore.

Now, admittedly, for casual use it's not always clear _why_ the index is 
so central, so the fact that you overlooked it is certainly easy to 
understand. Just take my word for it: to truly understand git, you do need 
to understand the index.

You can ignore it for a long time, because one of the primary reasons for 
it existing is about performance. That happens to be a primary goal of 
git, of course, but some people always think it's "just performance". It's 
way more fundamental than that.

So the way you can start getting used to the index is to think of it as a 
way to avoid having to do a full 'readdir()' on the whole tree to figure 
out what is in there, and avoiding having to read all the files to check 
that their contents still match.

Of course, if that was _all_ the index did, it could be seen purely as a 
cache, and have no semantic visibility at all. And that's not the case: 
the index does have real semantic visibility.

The first time you'll see it is when you decide to stage your changes in 
parts. The index is what allows you to _not_ always commit all your 
changes exactly because git keeps track of something more than _just_ your 
whole current working tree.

A special case (but a really useful one) of the "staging your changes in 
parts" is when you do merges. Now, most people don't do merges like I do 
(what, average of 5 merges per day, day in and day out), so most people 
don't care quite as deeply as I do, but if you ever do a merge where 99% 
merged cleanly, and 1% did not (which is the common case for conflicts), 
you'll really understand why having a system that keeps track of the parts 
that merged cleanly is _critical_. 

So for merges, the index keeps track of what merged cleanly, and what 
didn't, and what the original state for the not-clean stuff was. And as 
somebody who probably does more merges than likely any other human in the 
history of the world, I can state with some authority that any source 
control model that doesn't have this is fundamentally broken.

So the index is really _really_ important. Even if you can ignore it most 
of the time. And the index is why you don't have a model of "always just 
track the exact tree state".

			Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Hey - A Conceptual Simplication....
  2009-11-18 19:52       ` Jason Sewall
@ 2009-11-19  2:03         ` George Dennie
  2009-11-19  7:42           ` Björn Steinbrink
                             ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: George Dennie @ 2009-11-19  2:03 UTC (permalink / raw)
  To: 'Jason Sewall', 'Jakub Narebski'
  Cc: 'Jan Krüger', git

Thanks Linus, Jason, and Jakub...

Linus Torvalds wrote....
>On Wed, 18 Nov 2009, George Dennie wrote:
>> 
>> The Git model does not seem to go far enough conceptually, for some 
>> unexplainable reason...
>
> Others already mentioned this, but the concept you missed is the git 'index', which is actually very 
> central (it is actually the first part of git written, before even the object database) but is something 
> that most people who get started with git can (and do) ignore.

Uhmmm, subtle. I hear you. Thanks for the heads up. But before that, I just put these two cents down...

One of the persistent problems with software documentation is that it often fails to define the "functional or usage" model, apart from a dry list of commands. I am sure there are many good reasons for this. For one thing, explaining stuff is hard. Now, I have not had occasions to do merges, as such. So I am finding the justification for the index vague. I am wondering whether this might be a great space to describe the functional model of git in a way that more clearly justifies the index...

Specifically, can there be a succinct description of the usage or functional model of Git that necessarily incorporates the index. 

For example, the functional notion of the repository seems well defined: a growing web of immutable commits each created as either an isolated commit or more typically an update and/or merger of one or more pre-existing commits. 

With such a description the rest of the structure becomes almost implicit: Commits may be annotated such as with release number labels. Commits that have not been linked to such as by an update or merger remain dangling like loose threads in the web and are called branches. Branches may be given special labels that the repository will then automatically update so as to refer to the latest commit to that branch.

I don't yet have such a clear model for the index. Yes it is a staging platform, but so is the IDE....I'll do more reading.

Jason Sewell wrote....
> I find this leads to big, shapeless commits and, as I mentioned before, it seriously limits the utility 
> of 'git bisect'.  I also fail to see how 'selectively saving parts of the document' is versioning and 
> publishing - what is the publishing part?  The act of committing is one thing (and 'saving...

The notion of a shapeless commit is curious. Intuitively, I consider a commit as capturing the state of my work at a transactional boundary (i.e. a successful unit test...or even lunch break). However, your characterization of "shape" suggest that you are constructing something other than the immediate functionality of the software. Consequently, your software document is not really the solution files alone but also this commit history that you meticulously craft. 

Further, the participating of the IDE is not to compose within itself the committable document but rather to contribute to such a document in pieces. In fact, the closest metaphor to this process/workflow seems to be submitting articles to a magazine; except you are both the writer and editor/graphic artist; and each edition of the magazine becoming the committable version. 

With this metaphor the index does play a clear role as a layout board of sorts for the complete magazine. And also clearly, the IDE does not "functionally" edit the entire committable document but rather parts of it. Even though it may effectively have the entirety of the index in its working tree; Git requires that it be submitted to the index which is the true committable document. 

It begs the question, why is the working tree (the IDE document) so closely tied to the repository since it really amounts to a scratch pad. In fact, while the index may be attach to the working tree, the repository can be anywhere and have more than one index attached...yeah, I know, having a personal dedicated repository is cheap. (A great example of how expediency, the proximity of the repository, might obscure the functional model by making what is arbitrary and due to convention appear a functional necessity...; if, in fact, my above conclusion is correct of course :)

> What if you are hacking away and make changes to several parts of the code at once?  Making the commits 
> as fine-grained as possible makes it easier to cherry-pick, bisect, and understand the history.

You know Jason, it is often hard to isolate my changes to specific files. I have come to appreciate unit tests as a means of delineating changes. However, clearly the historically record of your solution tree is of substantially value to you. It is something I will have to pay closer attention in my case.

> Perhaps I don't understand your scheme, but it sounds like you're advocating 2 .gitignores:
>
> * .gitignore_track; with everything you don't automatically staged but  which can be trashed by your cleaning checkout
> * .gitignore_keep; with things you don't want staged but which shouldn't be deleted by git during cleaning

Yep, that may be one implementation...but essentially the current .gitignores list exclusionary filters for the "git add ." command. The suggestion was to augment it to also include exclusionary filters for the proposed "git checkout -clean" command.  By perhaps prefixing "+" and "-" symbols to the listed elements you can designate each filter's participation in the "do not add" and "do not delete" activities, respectively. However, this suggest was with the presumption that the work tree was the committable document, but clearly it is not.

> Who is pruning after a commit?  Once nice thing about checkout is that it will refuse to move to a 
> different commit if there are files that will get trashed.  Then you can say 'oops, I should 
> stash/commit/nuke that stuff before I change HEAD.

Not trashing files is a nice thing by checkout. However, are you referring to changes added to the index or changes made in the working tree but not yet added to the index. Base on my current understanding of the functional model, you would be referring to the index since the working tree is little more than a scratch pad. The pruning comment was in recognition that the working tree was not expected to be committable in its entirety.

George.

Thanks again for your input and if you have the time I welcome your response.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19  2:03         ` George Dennie
@ 2009-11-19  7:42           ` Björn Steinbrink
  2009-11-19 20:12             ` George Dennie
  2009-11-19 10:27           ` Jakub Narebski
  2009-11-20  1:48           ` Dmitry Potapov
  2 siblings, 1 reply; 25+ messages in thread
From: Björn Steinbrink @ 2009-11-19  7:42 UTC (permalink / raw)
  To: George Dennie
  Cc: 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On 2009.11.18 21:03:31 -0500, George Dennie wrote:
> Jason Sewell wrote....
> > I find this leads to big, shapeless commits and, as I mentioned
> > before, it seriously limits the utility of 'git bisect'.  I also
> > fail to see how 'selectively saving parts of the document' is
> > versioning and publishing - what is the publishing part?  The act of
> > committing is one thing (and 'saving...
> 
> The notion of a shapeless commit is curious. Intuitively, I consider a
> commit as capturing the state of my work at a transactional boundary
> (i.e. a successful unit test...or even lunch break). However, your
> characterization of "shape" suggest that you are constructing
> something other than the immediate functionality of the software.
> Consequently, your software document is not really the solution files
> alone but also this commit history that you meticulously craft. 

Your "lunch break" as a transaction boundary is a great example of
something that probably most people on this list would consider to
create commits that need rewriting before publishing them. Let's take an
extreme example:

You work on adding a feature to some webmail site that adds colors to
the mail being displayed, using different colors for the headers, quoted
sections and the text from the sender. The colors should be configurable
by the user.

*work*
git commit -m "Go for a coffee"
*work*
git commit -m "Lunch break"
*work*
git commit -m "Meeting"
*work*
git commit -m "Time to go home"

*come back to work*
*work*
git commit -m "Finished the mail coloring support"

This gives you:

* Finished the mail coloring support
|
* Time to go home
|
* Meeting
|
* Lunch break
|
* Go for a coffee

Such a history is basically completely useless. It's (ab)using the VCS
as a plain code dump. In a week, you'll be able to see that you had a
meeting that day, but it doesn't tell you anything about what you did to
the project. And even with less "insane" commit messages, the
"transactional boundaries" are totally arbitrary. They're aligned to
things you did that have absolutely nothing to do with the stuff you're
tracking in your VCS.

A far more useful history might look like this:

* Colorize quoted text in a mail, depending on its quoting depth
|
* Parse mails into a tree structure to represent sections of quoted text
|
* Colorize mail headers
|
* Add support for the user to change the colors used for mails
|
* Add configuration variable for the colors used for mails


At each step, something functionally changed about the software. The
commit messages tell you something about how the software evolved. And
if you get bogus values for the colors in the configuration, you can be
90% sure, by only looking at the commit messages, that you have a bug in
the "Add support for the user to change the colors ..." commit, and not
in one of the others. So you can run "git show $that_commit" to see the
diff of the changes you made in that commit and quickly check them for
your bug.

And while that's not sooo useful for commits that added new
functionality, it's extremely useful for commits that just made small
changes to existing functionality. Finding a bug in a large piece of
code (say 2000 lines) isn't trivial. But if you know that a commit that
changed 5 lines in that code is responsible for the breakage, all you
have to do is to identify the faulty change, which is a lot easier.

And with a large history, where it's not obvious in which commit
something got broken, "git bisect" can help to quickly find the bad
commit. Now consider "git bisect" finding your "Lunch break" commit.
Looking at the commit message tells nothing. The diff is pretty much
arbitrary, might be huge. Not much help. Finding the "Add support for
the user to change the colors ..." commit already tells you something
just because of the commit message. And the diff is about just one
specific change. It's all nicely separated, and that's a huge value.

Using git and producing nice commits is about _documenting_ the history
of your code. And having small, self-contained and well separated
commits is key to that.


And the index can be a great help with that. Given the above example,
you might already have some code to use the configured colors, just for
testing, so things aren't so boring. Maybe even some hack-up of the
code you'll be using later. If that part of the code would be committed
right away, you'd mess up your commit, because it wouldn't be about a
single change anymore, but would also have your testing code in there.
Bad.

But you don't want to throw the testing code away either, because it's
useful right now, and you might need it later, because it might evolve
into the final code used for the actual coloring. So, what now? You have
code that you want to commit, and some code you don't want to commit,
and which needs to go away temporarily, so you can test without it. No
problem, here comes the index.

Say you have:
config.c     # Has changes for the colors
show_mail.c  # Has changes to use the colors
whatever.c   # Has some changes for both

You do:
git add config.c       # Add to the index
git add -p whatever.c  # Only add some hunks to the index

So now the index has what you want to commit, and the working tree still
has everything.

git stash save --keep-index

Now your working tree and index only have the things you want to commit.
You run your unit tests, everythings fine. You commit and get a nice
clean commit, for which you write a useful commit message.

git stash pop

You've got your changes back that you didn't want to commit just yet,
and you can continue working.


Another use-case I have found for myself is to use the index to separate
reviewed and not-yet-reviewed changes. Before I commit, I always review
the diff of the things I'm going to commit. So I start out with "git
diff" and start reading. When I finished reviewing a file, I can do "git
add $that_file", so the diff for that file will no longer be shown by
"git diff". That nicely cuts down the size of the "git diff" output to
things I'm still interested in. Quite useful when you are forced to do a
large commit, because you did some refactoring. If I find a bug during
the review, I can fix that and re-run "git diff", which will only show
changes to me that I didn't declare as "good" already by adding them to
the index.


Sure, it takes some pratice and discipline to generate a nice, useful
history. But that's not much different from writing code. Others will
hate you for writing unreadable spaghetti code, and so will they hate
you for producing a useless history that tells them that you had lunch,
instead of telling them what you did to the code ;-)

Björn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19  2:03         ` George Dennie
  2009-11-19  7:42           ` Björn Steinbrink
@ 2009-11-19 10:27           ` Jakub Narebski
  2009-11-20  1:48           ` Dmitry Potapov
  2 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2009-11-19 10:27 UTC (permalink / raw)
  To: George Dennie; +Cc: 'Jason Sewall', 'Jan Krüger', git

Side-note: you are employing very strange line wrapping... you should
word wrap your lines so they do not exceed 70-76 characters, and you
should not (except when required for readability) rewrap quoted text.

On Thu, 19 Nov 2009, George Dennie wrote:
> Thanks Linus, Jason, and Jakub...
> 
> Linus Torvalds wrote....
>>On Wed, 18 Nov 2009, George Dennie wrote:
>>> 
>>> The Git model does not seem to go far enough conceptually, for some 
>>> unexplainable reason...
>>
>> Others already mentioned this, but the concept you missed is the git
>> 'index', which is actually very central (it is actually the first
>> part of git written, before even the object database) but is
>> something that most people who get started with git can (and do)
>> ignore. 
> 
> Uhmmm, subtle. I hear you. Thanks for the heads up. But before that,
> I just put these two cents down... 

> [...] Now, I have not had occasions to do merges, as such. So I am
> finding the justification for the index vague. [...]

Errr... you didn't do any merges?  What is then your experience with
using version control, then?


As for using index during merge: merge is joining two (or more) lines
of history (lines of development), bringing contents of another branch
into current branch.  Some of changes are independent, for example
if one branch changes one file, and other branch changed other file.
This is so called trivial merge, example of tree-level merge.  Even
if branches merged touch the same file, if changes were made in separate
sections of file git can merge changes (using three-way merge / diff3
algorithm).

The problem starts if there are changes which touch the same sections
of a file.  This generates so called merge conflict (contents conflict),
and you have to resolve such conflict manually.

During merge index helps to manage information about yet unmerged parts.
Let's assume for example that you made a mistake in merge resolution in
some file, and you want to scratch your attempt and try it anew. 
Without index it would be very hard to do without trashing resolutions 
of other conflicts.

> For example, the functional notion of the repository seems well
> defined: a growing web of immutable commits each created as either
> an isolated commit or more typically an update and/or merger of
> one or more pre-existing commits.

If by "web" you mean DAG (Directed Acyclic Graph) of commits, then
yes, it is _part_ of repository.

There are also refs (branches, tags, remote-tracking branches), which 
are also part of repository, very important part.  Those are named
references into DAG of commits.


As to commits being created as update of existing commit or from 
scratch: that would depend on the way of development.  Merge commits
are much, much more rare than ordinary commits (especially that git 
favors fast-forwards by default when there is no need for merge).

> 
> With such a description the rest of the structure becomes almost
> implicit: Commits may be annotated such as with release number labels.
> Commits that have not been linked to such as by an update or merger
> remain dangling like loose threads in the web and are called branches.
> Branches may be given special labels that the repository will then
> automatically update so as to refer to the latest commit to that
> branch.      

Almost right.
 
> I don't yet have such a clear model for the index. Yes it is a staging
> platform, but so is the IDE....I'll do more reading. 

The index is area where you prepare commits, if needed.  But you
don't need to care that there is something like the index, and prepare
your commits in working area.  But when you need it, it is there.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Hey - A Conceptual Simplication....
  2009-11-19  7:42           ` Björn Steinbrink
@ 2009-11-19 20:12             ` George Dennie
  2009-11-19 21:27               ` Junio C Hamano
                                 ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: George Dennie @ 2009-11-19 20:12 UTC (permalink / raw)
  To: git
  Cc: B.Steinbrink, 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	torvalds

Thanks Jakub Narebski and Björn Steinbrink...Nice description Björn.

I think an important piece of conceptual information missing from the docs
is a concise list of the conceptual properties defining the context of the
working tree, index, and repository during normal use. This itemization
would go far in explaining the synergies between the various commands. 

Functionally, all the commands merely manipulate these properties. If these
properties were summarize in context one would expect that would represent a
very complete functional model of Git. A user could review the description
figure what they wanted to do and then find the command(s) to accomplish it.


Presently this knowledge is accreted over time as oppose to merely being
read and in the space of a few minutes "groked" (of course it could be that
I am particularly limited :).

For example, towards a functional model, is this close? (note: all
properties can be blank/empty)...

REPOSITORIES
	Collection of Commits
	Collection of Branches
		-- collection of commits without children
		-- as a result each commits either augments
		-- and existing branch or creates a new one
	Master Branch
		-- typically the publishable development history

INDEX
	Collections of Parent/Merge Commits
		-- the commit will use all these as its parent

	Staged Commit 
		-- these changes are shown relative to the working tree

	Default Branch
		-- the history the staged commit is suppose to augment

	Collection of Stashes
		-- these are not copies of the working tree since they
		-- only contain "versioned" files/folders and so is not
		-- a backup

WORKING_TREE
	Collection of Files and Folders
	

As far as I can tell, the working tree is not suppose to be stateful, but it
seems the commands treat it as such.

What is interesting is that branches serve to encourage a serialized view of
commits. More than structure, they are like books in a library narrating a
development story. Consequently, and interestingly, they are as much the
purpose of the repository as the commits they organize...which is
interesting.


Again, thanks for your patients.

George.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19 20:12             ` George Dennie
@ 2009-11-19 21:27               ` Junio C Hamano
  2009-11-20  0:49               ` Jakub Narebski
  2009-11-20  2:31               ` Dmitry Potapov
  2 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2009-11-19 21:27 UTC (permalink / raw)
  To: George Dennie
  Cc: git, B.Steinbrink, 'Jason Sewall',
	'Jakub Narebski', 'Jan Krüger',
	torvalds

"George Dennie" <gdennie@pospeople.com> writes:

> REPOSITORIES
> 	Collection of Commits

Ok.

> 	Collection of Branches
> 		-- collection of commits without children

Wrong.

> 		-- as a result each commits either augments
> 		-- and existing branch or creates a new one

Ok.

> 	Master Branch
> 		-- typically the publishable development history

Not necessarily.

> INDEX
> 	Collections of Parent/Merge Commits
> 		-- the commit will use all these as its parent

Wrong.

> 	Staged Commit 
> 		-- these changes are shown relative to the working tree

A new word for me.  I doubt we need to have such a concept.

> 	Default Branch
> 		-- the history the staged commit is suppose to augment

We typically call it "the current branch".  It is "the branch whose tip
will advance by one commit when you make a new commit" and determined by
HEAD.

> 	Collection of Stashes
> 		-- these are not copies of the working tree since they
> 		-- only contain "versioned" files/folders and so is not
> 		-- a backup

I think it is better to say what these _are_, instead of saying what they
are not.  These are not yoghurt cups, these are nor bicycles, these are
not knitting needles.  Listing what they are not does not give you more
information.

> WORKING_TREE
> 	Collection of Files and Folders

Ok.

> As far as I can tell, the working tree is not suppose to be stateful, but it
> seems the commands treat it as such.

I am not sure what you are trying to say by "stateful" here.  A work tree
has files and directories, and if you edit one of the files of course it
changes its state.

----------------------------------------------------------------

A branch is just a pointer to one commit (or nothingness, if it is unborn,
but that is such a special case you do not have to worry about yet until
you understand git more).

The commit can have many children, but you do not care about them when
looking at the branch, as there is no "parent-to-children" pointer.

The pointer that represents a branch moves to another commit by
different operations.

 - If you make a new commit while on the branch, it points to the new
   commit.  This is the most typical, and is done by many every-day
   commands, such as "commit", "am", "merge", "cherry-pick", "revert".

   Typically the new commit B is a direct child of the commit the branch
   used to point at A, and B has A as its first parent.

 - There are commands that let you violate the above, i.e. you can change
   what commit the branch pointer points at, and the new commit A does not
   have to be a direct child of the commit currently pointed by the
   branch.  "reset" and "rebase" are examples of such commands and are to
   rewrite the history.

There is the "current branch" that you are on.  It is recorded in HEAD
(cat .git/HEAD to see it).  When you create a new commit, the tip of the
branch HEAD points at is updated to point at the new commit.  Since the
new commit is made a direct child of the current commit, this will appear
to the users as "advancing the branch".

The state (contents of files and symlinks together with where they are in
the tree) to be commited next is recorded in the index.  "git add" and
friends are used to update this state in the index, and "git diff" with
various options allow you to view the difference between this state and
work tree or arbitrary commit.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19 20:12             ` George Dennie
  2009-11-19 21:27               ` Junio C Hamano
@ 2009-11-20  0:49               ` Jakub Narebski
  2009-11-20  6:27                 ` Junio C Hamano
  2009-11-20  2:31               ` Dmitry Potapov
  2 siblings, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2009-11-20  0:49 UTC (permalink / raw)
  To: George Dennie
  Cc: git, B.Steinbrink, 'Jason Sewall',
	'Jan Krüger',
	torvalds

On Thu, 19 Nov 2009, George Dennie wrote:

> Thanks Jakub Narebski and Björn Steinbrink...Nice description Björn.
> 
> I think an important piece of conceptual information missing from the docs
> is a concise list of the conceptual properties defining the context of the
> working tree, index, and repository during normal use. This itemization
> would go far in explaining the synergies between the various commands.

If you didn't find sufficient description of underlying concepts behind
git in "Git User's Manual" (distributed with Git), "Git Community Book"
or "Pro Git", take a look at the following documents:

 * "Git for Computer Scientists"
 * "Git From Bottom's Up"
 * "The Git Parable"

> Functionally, all the commands merely manipulate these properties. If these
> properties were summarize in context one would expect that would represent a
> very complete functional model of Git. A user could review the description
> figure what they wanted to do and then find the command(s) to accomplish it.

I disagree.  While understanding underlying concepts of Git helps with
finding a way to get what one wants to achieve, I don't think that the way
presented here would work in practice.

> Presently this knowledge is accreted over time as oppose to merely being
> read and in the space of a few minutes "groked" (of course it could be that
> I am particularly limited :).

It is documented, see referenced mentioned above.

> For example, towards a functional model, is this close? (note: all
> properties can be blank/empty)...
> 
> REPOSITORIES
> 	Collection of Commits

Direct Acyclic Graph of Commits, where edges in graph point from commit
to zero or more its parents.

> 	Collection of Branches
> 		-- collection of commits without children

Errr... what?  Commit doesn't *have* [pointer to] children.  Also branch
can point to commit for which there exists other commit which has given
commit as parent (up-to-date or fast-forward situation, e.g.)


    a---b---c            <--- branch_a
             \
              \-d---e    <--- branch_b

Branches (or branch heads / branch tips) are named references into DAG
of commits, points where DAG of commits grow.

> 		-- as a result each commits either augments
> 		-- and existing branch or creates a new one

Commits do not create a new branch.  New commits must be crated on
existing branch (or on unnamed branch aka detached HEAD, but that is
advanced usage).

> 	Master Branch
> 		-- typically the publishable development history

TANSTAAMB. There ain't such thing as a master branch. ;-)))))

Well, at least not in a sense of there being a branch that is a trunk
branch distinguished by _technical_ means.

> 
> INDEX
> 	Collections of Parent/Merge Commits
> 		-- the commit will use all these as its parent

No.  The index is set of versions of files (blobs) that would go as
a contents (tree) of a next commit (if you use "git commit', not 
"git commit -a").

> 
> 	Staged Commit 
> 		-- these changes are shown relative to the working tree

Errr.... what?

> 
> 	Default Branch
> 		-- the history the staged commit is suppose to augment

Errr... what?

If by "default branch" you mean "current branch", it is currently checked
out branch, where new commit would go, pointed by HEAD symbolic reference.


> WORKING_TREE
> 	Collection of Files and Folders
> 	
> 
> As far as I can tell, the working tree is not suppose to be stateful, but it
> seems the commands treat it as such.

Stateful?

Working tree / working area is a working area.  It can be disconnected from
repository via core.worktree, --work-tree option and GIT_WORK_TREE 
environment, see also contrib/workdir/git-new-workdir


> Again, thanks for your patients.

patience.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-18 18:51   ` George Dennie
  2009-11-18 19:40     ` Jakub Narebski
@ 2009-11-20  1:35     ` Dmitry Potapov
  2009-11-20  6:33       ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20  1:35 UTC (permalink / raw)
  To: George Dennie; +Cc: 'Jan Krüger', git

On Wed, Nov 18, 2009 at 01:51:56PM -0500, George Dennie wrote:
> 
> One of the concerns I have with the manual pick-n-commit is that you can
> forget a file or two.

It is more difficult to make this mistake with Git than many others
VCSes, because Git shows the list of files that are changed but not
committed as well as the list of untracked files when you try to commit
something. So, it has never been a real issue for me in practice...

> Consequently, unless you do a clean checkout and test
> of the commit, you don't know that your publishable version even compiles.

If you want to be sure that clean checkout will be compiled, the only
way to guarantee that is to do a clean checkout. Even if you commit all
files except those that are specified in .gitignore, it is not enough to
be sure that a clean checkout will be compiled... But in most cases, you
do not need to do that to be *reasonable* sure that a clean checkout
will be compiled later, and if you have any doubts, you can do a clean
checkout and testing _after_ committing your changes. There is no reason
to be afraid to commit something that may not work if you can amend that
later (until you publish your changes).

> It seems safer to commit the entirety of your work in its working state and
> then do a clean checkout from a dedicated publishable branch and manually
> merge the changes in that, test, and commit.

Maybe I did not understand your words, but I am not sure what is gained
in this way... Clearly there is no reason to publish a work that you
have not tested yet. And no one cares about crap that you keep in your
working tree either... So, a better approach is to commit your changes
as a series of patches that can be reviewed easily, then do all testing
and then publish them for integration with the main development branch.

> 
> It seems the intuitive model is to treat version control as applying to the
> whole document, not parts of it. In this respect the document is defined by
> the IDE, namely the entire solution, warts and all.

This is a very bogus idea. If you want to preserve all warts etc, you
just do backup of the whole disk and now you have a state that can be
compiled any time later (provided that your hardware do not change too
much). In my experience, in most cases when I was not able to compile
an old version were caused not by forgetting to commit something, but
changing in the environment (like new compiler, new libraries, etc).

But when your commits are fine-grained, you can always cherry-pick the
corresponding fix-up and compile this old version if it is necessary.

In my experience, the value of VCS history is the ability to look at it
(sometimes many years later) and understand who wrote this line and why.
Also, nearly all cases when I had to compile some old version were due
to bisecting some tricky bug. In both cases, having fine-grained commits
was crucial to success.

> When you start
> selectively saving parts of the document then you are doing two things,
> versioning and publishing; and at the same time.

No, you don't. Committing some changes and publishing them are two
separated operations in Git, and that it is pretty much fundamental.
Normally, you commit changes in a few separated patches, review them to
make sure that changes match commit messages, do all testing, and only
then you publish them.


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19  2:03         ` George Dennie
  2009-11-19  7:42           ` Björn Steinbrink
  2009-11-19 10:27           ` Jakub Narebski
@ 2009-11-20  1:48           ` Dmitry Potapov
  2009-11-20  1:55             ` david
  2009-11-20  2:35             ` Björn Steinbrink
  2 siblings, 2 replies; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20  1:48 UTC (permalink / raw)
  To: George Dennie
  Cc: 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote:
> 
> For example, the functional notion of the repository seems well
> defined: a growing web of immutable commits each created as either an
> isolated commit or more typically an update and/or merger of one or
> more pre-existing commits. 

In Git, commits are not immutable. One thing that many Git users do
is git-rebase, which in essense is re-writing or re-ordering exising
commits. So, you can change history in Git, but you should never change
the published history. (Of course, that leads to the question what is
considered as published history. For instance, commits merged on the
proposed-updates branch are usually not considered to be "published",
so they can be re-written or discarded later).

So, the correct way to use Git is to find the right balance between
the need to clean up after mistakes (using git-rebase) and not doing
too much, so you will not lose important history or create problems
for other peoples.

> 
> The notion of a shapeless commit is curious. Intuitively, I consider a
> commit as capturing the state of my work at a transactional boundary
> (i.e. a successful unit test...or even lunch break).

No, it is not what Git commits were intended for. In Git, a commit is
a change intended to achieve some goal. Basically, you send a patch
to maintainer, and you should explain what this patch does and why it
is useful... If your explanation is "I have a lunch break now", it is
very bad explanation, thus a bad patch.


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  1:48           ` Dmitry Potapov
@ 2009-11-20  1:55             ` david
  2009-11-20  2:56               ` Dmitry Potapov
  2009-11-20  2:35             ` Björn Steinbrink
  1 sibling, 1 reply; 25+ messages in thread
From: david @ 2009-11-20  1:55 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On Fri, 20 Nov 2009, Dmitry Potapov wrote:

> On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote:
>>
>> For example, the functional notion of the repository seems well
>> defined: a growing web of immutable commits each created as either an
>> isolated commit or more typically an update and/or merger of one or
>> more pre-existing commits.
>
> In Git, commits are not immutable. One thing that many Git users do
> is git-rebase, which in essense is re-writing or re-ordering exising
> commits. So, you can change history in Git, but you should never change
> the published history. (Of course, that leads to the question what is
> considered as published history. For instance, commits merged on the
> proposed-updates branch are usually not considered to be "published",
> so they can be re-written or discarded later).
>
> So, the correct way to use Git is to find the right balance between
> the need to clean up after mistakes (using git-rebase) and not doing
> too much, so you will not lose important history or create problems
> for other peoples.

the typical advice is to clean up before you make changes public, but not 
afterwords.

David Lang

>>
>> The notion of a shapeless commit is curious. Intuitively, I consider a
>> commit as capturing the state of my work at a transactional boundary
>> (i.e. a successful unit test...or even lunch break).
>
> No, it is not what Git commits were intended for. In Git, a commit is
> a change intended to achieve some goal. Basically, you send a patch
> to maintainer, and you should explain what this patch does and why it
> is useful... If your explanation is "I have a lunch break now", it is
> very bad explanation, thus a bad patch.
>
>
> Dmitry
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-19 20:12             ` George Dennie
  2009-11-19 21:27               ` Junio C Hamano
  2009-11-20  0:49               ` Jakub Narebski
@ 2009-11-20  2:31               ` Dmitry Potapov
  2 siblings, 0 replies; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20  2:31 UTC (permalink / raw)
  To: George Dennie
  Cc: git, B.Steinbrink, 'Jason Sewall',
	'Jakub Narebski', 'Jan Krüger',
	torvalds

On Thu, Nov 19, 2009 at 03:12:35PM -0500, George Dennie wrote:
> 
> I think an important piece of conceptual information missing from the docs
> is a concise list of the conceptual properties defining the context of the
> working tree, index, and repository during normal use. This itemization
> would go far in explaining the synergies between the various commands. 

Speaking about "normal use"... I suggest you read about Git workflows:

$ git help gitworkflows

> 
> Functionally, all the commands merely manipulate these properties. If these
> properties were summarize in context one would expect that would represent a
> very complete functional model of Git. A user could review the description
> figure what they wanted to do and then find the command(s) to accomplish it.

It is like to say that driving a car merely means to manipulate its
components, so if these components were summarized, it would be all
that one needs to know to drive a car...

While I don't dispute that basic understanding of key Git concepts is
important, understanding of a typical Git workflow cannot be deduced
from knowledge of separate parts. Now if I were to describe Git just in
a few words, I would say that Git repository is just a DAG of objects,
the working tree is the place where you work, and the index is what
helps you to create fine-grained commits and do merges. But it says
very little (if anything) about how to use it.


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  1:48           ` Dmitry Potapov
  2009-11-20  1:55             ` david
@ 2009-11-20  2:35             ` Björn Steinbrink
  2009-11-20  3:08               ` Dmitry Potapov
  1 sibling, 1 reply; 25+ messages in thread
From: Björn Steinbrink @ 2009-11-20  2:35 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On 2009.11.20 04:48:44 +0300, Dmitry Potapov wrote:
> On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote:
> > 
> > For example, the functional notion of the repository seems well
> > defined: a growing web of immutable commits each created as either an
> > isolated commit or more typically an update and/or merger of one or
> > more pre-existing commits. 
> 
> In Git, commits are not immutable.

Commit _are_ immutable. Like all git objects (blob, tree, commits, tag).
"Rewriting" history actually means creating a new history (adding
objects), and then changing a ref (most often a branch head) to
reference the new instead of the old history.

Björn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  1:55             ` david
@ 2009-11-20  2:56               ` Dmitry Potapov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20  2:56 UTC (permalink / raw)
  To: david
  Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On Thu, Nov 19, 2009 at 05:55:21PM -0800, david@lang.hm wrote:
> On Fri, 20 Nov 2009, Dmitry Potapov wrote:
>
>> So, the correct way to use Git is to find the right balance between
>> the need to clean up after mistakes (using git-rebase) and not doing
>> too much, so you will not lose important history or create problems
>> for other peoples.
>
> the typical advice is to clean up before you make changes public, but not 
> afterwords.

True, except patches may get additional clean up or improvements based
on review feedback, or even get some small fix-ups while they live on
'pu'. But re-writing something that other people may base their work on
is clearly wrong. On the other hand, rebasing a large series of patches
even if it has never been published may be a wrong way to go, because
you replace well tested states with some others, which were not tested.
So if it is a long and complex series of patches, chances are high that
you can break something in it. So, it requires some judgement when to
use git-rebase and when git-merge.


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  2:35             ` Björn Steinbrink
@ 2009-11-20  3:08               ` Dmitry Potapov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20  3:08 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: George Dennie, 'Jason Sewall', 'Jakub Narebski',
	'Jan Krüger',
	git

On Fri, Nov 20, 2009 at 03:35:40AM +0100, Björn Steinbrink wrote:
> On 2009.11.20 04:48:44 +0300, Dmitry Potapov wrote:
> > On Wed, Nov 18, 2009 at 09:03:31PM -0500, George Dennie wrote:
> > > 
> > > For example, the functional notion of the repository seems well
> > > defined: a growing web of immutable commits each created as either an
> > > isolated commit or more typically an update and/or merger of one or
> > > more pre-existing commits. 
> > 
> > In Git, commits are not immutable.
> 
> Commit _are_ immutable. Like all git objects (blob, tree, commits, tag).
> "Rewriting" history actually means creating a new history (adding
> objects), and then changing a ref (most often a branch head) to
> reference the new instead of the old history.

I stand corrected. All objects in Git repository are actually immutable,
but because references can be changed (and tools like git-rebase change
it automatically), it _appears_ like editing existing commits, but in
fact old commits do not disappear immediately. Even if there is no other
branches or tags that refer to old commits, git-reflog stores references
to them for 30 days after that the garbage collector can remove them.


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  0:49               ` Jakub Narebski
@ 2009-11-20  6:27                 ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2009-11-20  6:27 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: George Dennie, git, B.Steinbrink, 'Jason Sewall',
	'Jan Krüger',
	torvalds

Jakub Narebski <jnareb@gmail.com> writes:

> If you didn't find sufficient description of underlying concepts behind
> git in "Git User's Manual" (distributed with Git), "Git Community Book"
> or "Pro Git", take a look at the following documents:
>
>  * "Git for Computer Scientists"
>  * "Git From Bottom's Up"
>  * "The Git Parable"
> ...
> It is documented, see referenced mentioned above.

I actually would want ourselves step back a bit and make sure that anybody
who is completely new to git won't get confused with the concepts after
s/he reads our "Git User's Manual" and nothing else.  Listing five or six
documents and "you'll find information somewhere among these" *might* be
the best thing we could do at this very second, but we should strive to do
better than that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  1:35     ` Dmitry Potapov
@ 2009-11-20  6:33       ` Junio C Hamano
  2009-11-20 15:07         ` Dmitry Potapov
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2009-11-20  6:33 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: George Dennie, 'Jan Krüger', git

Dmitry Potapov <dpotapov@gmail.com> writes:

> It is more difficult to make this mistake with Git than many others
> VCSes, because Git shows the list of files that are changed but not
> committed as well as the list of untracked files when you try to commit
> something.

Not really in practice.  Too many people carry their existing practice of
using -m to write a useless single liner commit log message that they
acquired while using their previous SCM.  Arguably, useless log messages
are less of a problem on systems like CVS/SVN because they do not do
useful log summarization such as "log -- paths..." or "shortlog", so they
can be excused for learning the practice in the first place, though.

That incidentally is exactly why earlier we (mostly me and Linus)
recommended people not to teach "commit -m" to new people, but of course
nobody listened ;-).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Hey - A Conceptual Simplication....
  2009-11-20  6:33       ` Junio C Hamano
@ 2009-11-20 15:07         ` Dmitry Potapov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Potapov @ 2009-11-20 15:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: George Dennie, 'Jan Krüger', git

On Thu, Nov 19, 2009 at 10:33:05PM -0800, Junio C Hamano wrote:
> Dmitry Potapov <dpotapov@gmail.com> writes:
> 
> > It is more difficult to make this mistake with Git than many others
> > VCSes, because Git shows the list of files that are changed but not
> > committed as well as the list of untracked files when you try to commit
> > something.
> 
> Not really in practice.  Too many people carry their existing practice of
> using -m to write a useless single liner commit log message that they
> acquired while using their previous SCM.

Well, at least, Git allows to avoid this mistake and produce good commit
messages, but you are right it is difficult to break old bad habits...

> Arguably, useless log messages
> are less of a problem on systems like CVS/SVN because they do not do
> useful log summarization such as "log -- paths..." or "shortlog", so they
> can be excused for learning the practice in the first place, though.

I think quite often commits in CVS/SVN cannot be summarized, because a
single commit often contains what would be a short series of patches in
Git plus a few separated fix-ups that are completely unrelated to the
whole series. It is trivial to split your changes in a few separate
commits in Git, but it is difficult to do that with CVS/SVN.

> That incidentally is exactly why earlier we (mostly me and Linus)
> recommended people not to teach "commit -m" to new people, but of course
> nobody listened ;-).

Those who got used to '-m' in another VCS will quickly find it on their
own... BTW, Git User's Manual uses "git commit -m" 8 times in different
examples, largely to explain what is committed here, and I think it is
similar with other introductions to Git. Though, clearly '-m' is rarely
useful in practice...


Dmitry

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-11-20 15:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-18 12:55 Hey - A Conceptual Simplication George Dennie
2009-11-18 13:18 ` Jonathan del Strother
2009-11-18 13:25 ` Jan Krüger
2009-11-18 18:51   ` George Dennie
2009-11-18 19:40     ` Jakub Narebski
2009-11-18 19:52       ` Jason Sewall
2009-11-19  2:03         ` George Dennie
2009-11-19  7:42           ` Björn Steinbrink
2009-11-19 20:12             ` George Dennie
2009-11-19 21:27               ` Junio C Hamano
2009-11-20  0:49               ` Jakub Narebski
2009-11-20  6:27                 ` Junio C Hamano
2009-11-20  2:31               ` Dmitry Potapov
2009-11-19 10:27           ` Jakub Narebski
2009-11-20  1:48           ` Dmitry Potapov
2009-11-20  1:55             ` david
2009-11-20  2:56               ` Dmitry Potapov
2009-11-20  2:35             ` Björn Steinbrink
2009-11-20  3:08               ` Dmitry Potapov
2009-11-20  1:35     ` Dmitry Potapov
2009-11-20  6:33       ` Junio C Hamano
2009-11-20 15:07         ` Dmitry Potapov
2009-11-18 13:30 ` Thomas Rast
2009-11-18 13:31 ` Jason Sewall
2009-11-18 20:36 ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.