All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] A Change to Commit IDs Too Ridiculous to Consider?
@ 2016-07-24 18:12 Jon Forrest
  2016-07-24 18:46 ` Jakub Narębski
  2016-07-24 18:51 ` Rodrigo Campos
  0 siblings, 2 replies; 10+ messages in thread
From: Jon Forrest @ 2016-07-24 18:12 UTC (permalink / raw)
  To: git


Those of us who write instructional material about Git all face the same problem.
This is that we can't write step by step instructions that show the results of
making a commit because users will always see different commit IDs.
This is fundamental to the design of Git.

Even if the instructional material tells users to use standard author and committer
information, (e.g. john.doe@example.com) and shows the text of the file being committed
and the commit message to add, the resulting commit ID will differ from reader to reader
since the commit will presumably take place at different times.

What if it were possible, for instructional purposes only, to somehow tell Git to relax
this requirement. By this I mean, the commit date would *not* be included when constructing
the commit ID. This would allow tutorials to show exactly what to expect to see when running commands.

I realize that questions would remain such as how to turn on this behavior (e.g. command line flags,
environment variables) and whether 'git log' (and maybe other commands) should somehow distinguish these
mutant commits. There would probably be other issues to consider.

Again, this is for instructional purposes only, and only when the committer explicitly
chooses to use this option. I'm *not* proposing a general change to Git's behavior.

Is such a thing to ridiculous to even consider? Is there a better way to achieve the same result?

Jon Forrest



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 18:12 [RFC] A Change to Commit IDs Too Ridiculous to Consider? Jon Forrest
@ 2016-07-24 18:46 ` Jakub Narębski
  2016-07-24 19:20   ` Jon Forrest
  2016-07-24 18:51 ` Rodrigo Campos
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Narębski @ 2016-07-24 18:46 UTC (permalink / raw)
  To: Jon Forrest, git

Please try to keep to the 80-character lines.

W dniu 2016-07-24 o 20:12, Jon Forrest pisze:

> Those of us who write instructional material about Git all face the
> same problem. This is that we can't write step by step instructions
> that show the results of making a commit because users will always
> see different commit IDs. This is fundamental to the design of Git.
> 
> Even if the instructional material tells users to use standard author
> and committer information, (e.g. john.doe@example.com) and shows the
> text of the file being committed and the commit message to add, the
> resulting commit ID will differ from reader to reader since the
> commit will presumably take place at different times.

There are two options: first, to tell the reader upfront that objects
id would be different / would change.  This has the advantage that
you do not need to update those objects when you change instructions
in the middle. Note that commit objects are not the only things that
change; for example the result of `ls -l` would also be slightly
different.

Another possibility is to set authordate and committerdate to some
specified time by the way of appropriate environment variables.

> 
> What if it were possible, for instructional purposes only, to somehow
> tell Git to relax this requirement. By this I mean, the commit date
> would *not* be included when constructing the commit ID. This would
> allow tutorials to show exactly what to expect to see when running
> commands.

What I think you don't realize is that "commit" objects are not
treated in any way special. Object identifiers of all objects are
SHA-1 hash of uncompressed loose representation of said object
(type + length + contents).

Well, you could not record dates in commit object, but I think
Git considers such objects broken.

> 
> I realize that questions would remain such as how to turn on this
> behavior (e.g. command line flags, environment variables) and whether
> 'git log' (and maybe other commands) should somehow distinguish
> these mutant commits. There would probably be other issues to
> consider.
> 
> Again, this is for instructional purposes only, and only when the
> committer explicitly chooses to use this option. I'm *not* proposing
> a general change to Git's behavior.
> 
> Is such a thing to ridiculous to even consider? Is there a better way
> to achieve the same result

IMVHO it would require heavy surgery of Git for little benefit
(see the beginning of reply for alternate solutions).
 
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 18:12 [RFC] A Change to Commit IDs Too Ridiculous to Consider? Jon Forrest
  2016-07-24 18:46 ` Jakub Narębski
@ 2016-07-24 18:51 ` Rodrigo Campos
  2016-07-24 19:57   ` Jon Forrest
  1 sibling, 1 reply; 10+ messages in thread
From: Rodrigo Campos @ 2016-07-24 18:51 UTC (permalink / raw)
  To: Jon Forrest; +Cc: git

On Sun, Jul 24, 2016 at 11:12:12AM -0700, Jon Forrest wrote:
> 
> Those of us who write instructional material about Git all face the same problem.
> This is that we can't write step by step instructions that show the results of
> making a commit because users will always see different commit IDs.
> This is fundamental to the design of Git.
> 
> Even if the instructional material tells users to use standard author and committer
> information, (e.g. john.doe@example.com) and shows the text of the file being committed
> and the commit message to add, the resulting commit ID will differ from reader to reader
> since the commit will presumably take place at different times.

And what is the problem with that, if you are doing it with instructional
purposes? Let's assume that this helps and not confuses later when the commits
*do* change. What is the problem you face?

I mean, for some examples you can use HEAD, HEAD^, HEAD~4, etc. and that always
works, no matter the commit id. In which cases do you want/need the commit ids
to be equal? Can you be more specific?




Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 18:46 ` Jakub Narębski
@ 2016-07-24 19:20   ` Jon Forrest
  2016-07-24 20:42     ` Jakub Narębski
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Forrest @ 2016-07-24 19:20 UTC (permalink / raw)
  To: Jakub Narębski, git



On 7/24/2016 11:46 AM, Jakub Narębski wrote:
> Please try to keep to the 80-character lines.

Sorry.

> Another possibility is to set authordate and committerdate to some
> specified time by the way of appropriate environment variables.

That sounds like a great idea. Assuming it
works the way I envision, this wouldn't require
any changes to the source code.

> What I think you don't realize is that "commit" objects are not
> treated in any way special. Object identifiers of all objects are
> SHA-1 hash of uncompressed loose representation of said object
> (type + length + contents).

I know this, but I thought that commit object IDs were the only
ones that included a date in what gets run through the SHA-1
hash function. If there are others, then you're right - they'd
need to be included in this proposal.

> Well, you could not record dates in commit object, but I think
> Git considers such objects broken.

You mean that Git could, after the fact, detect commit IDs
that didn't include a date? If this is true, then your
idea of using fixed dates from environment variables
would be the only way to do this.

> IMVHO it would require heavy surgery of Git for little benefit
> (see the beginning of reply for alternate solutions).

Even using your environment variable solution that wouldn't
require any code changes?

Jon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 18:51 ` Rodrigo Campos
@ 2016-07-24 19:57   ` Jon Forrest
  2016-07-25 15:03     ` Junio C Hamano
  2016-07-26 14:26     ` Philip Oakley
  0 siblings, 2 replies; 10+ messages in thread
From: Jon Forrest @ 2016-07-24 19:57 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: git



On 7/24/2016 11:51 AM, Rodrigo Campos wrote:
> And what is the problem with that, if you are doing it with instructional
> purposes? Let's assume that this helps and not confuses later when the commits
> *do* change. What is the problem you face?

A lot of instructional material contains stuff like "Do [xxx] and you'll
see [zzz]. If you don't then something went wrong so try to figure out
what happened and do it again."

Git, as it stands, for good reason doesn't allow this approach.

I don't think a Git beginner, when using a version of Git that somehow
works the way I proposed, will be confused. The fact that performing the
same steps results in the same commit IDs won't be something that
they'll care about or even notice. The material can include a callout
mentioning the difference between "real" Git and "learners" Git.

> I mean, for some examples you can use HEAD, HEAD^, HEAD~4, etc. and that always
> works, no matter the commit id.

This will work in some cases, but should come later in a Git book.
But, in many cases using relative commit IDs, rather than absolute,
will be less clear (I believe).

> In which cases do you want/need the commit ids to be equal?
> Can you be more specific?

Sure. Take a look at the 2nd or 3rd chapter of Pro Git Reedited, 2nd
Edition (or just Pro Git 2nd Edition - it doesn't matter). You see
lots of output showing 'git commit' commands and the commit IDs that
result. I suspect you'd see the same in almost any book about Git.

Jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 19:20   ` Jon Forrest
@ 2016-07-24 20:42     ` Jakub Narębski
  2016-07-25  3:56       ` Jon Forrest
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Narębski @ 2016-07-24 20:42 UTC (permalink / raw)
  To: Jon Forrest, git

W dniu 2016-07-24 o 21:20, Jon Forrest pisze:
> On 7/24/2016 11:46 AM, Jakub Narębski wrote:
>> 
>> Another possibility is to set authordate and committerdate to some
>> specified time by the way of appropriate environment variables.
> 
> That sounds like a great idea. Assuming it
> works the way I envision, this wouldn't require
> any changes to the source code.

This would however require for user to write more.
The environment variables are GIT_AUTHOR_DATE and GIT_COMMITTER_DATE;
their format is described in the "DATE FORMATS" section in the
git-commit(1) manpage.

And it is not something that the user would do when working
with Git themselves, for their own project.

>> What I think you don't realize is that "commit" objects are not
>> treated in any way special. Object identifiers of all objects are
>> SHA-1 hash of uncompressed loose representation of said object
>> (type + length + contents).
> 
> I know this, but I thought that commit object IDs were the only
> ones that included a date in what gets run through the SHA-1
> hash function. If there are others, then you're right - they'd
> need to be included in this proposal.

The problem is that many function in Git are object-type agnostic.
Changing how 'commit' objects are treated would require heavy code
surgery... unless done as filter (see below), but even then extra
code would be needed, for small benefit and large maintenance
burden.

Now that I think about this, it could be done when displaying
object names (in `git log` and `git show`), replacing true commit
object with SHA-1 of those objects with dates stripped. Still
needs work.

>> Well, you could not record dates in commit object, but I think
>> Git considers such objects broken.
> 
> You mean that Git could, after the fact, detect commit IDs
> that didn't include a date? If this is true, then your
> idea of using fixed dates from environment variables
> would be the only way to do this.

I think^H^H `git fsck` can check that objects are well formed,
and warn if they are not.

$ git fsck
error in commit 6deb0829fecdf1feab0cc7c66061a92a93cb19e7: 
  missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in commit 762f28c2567c07d378d485c3e2a498947d49f406: 
  badDate: invalid author/committer line - bad date

>> IMVHO it would require heavy surgery of Git for little benefit
>> (see the beginning of reply for alternate solutions).
> 
> Even using your environment variable solution that wouldn't
> require any code changes?

No, this do not need no changes to git code, of course.

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 20:42     ` Jakub Narębski
@ 2016-07-25  3:56       ` Jon Forrest
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Forrest @ 2016-07-25  3:56 UTC (permalink / raw)
  To: Jakub Narębski, git


>>> Another possibility is to set authordate and committerdate to some
>>> specified time by the way of appropriate environment variables.

To follow up, Jakub's approach works great without
requiring any changes to Git.

For example, the following test script always
produces the same commit ID:

----
export GIT_AUTHOR_DATE=2005-04-07T22:13:13
export GIT_COMMITTER_DATE=2005-04-07T22:13:13
mkdir -p /tmp/test
cd /tmp/test
rm -rf .git
git init
echo "Test" > README
git add README
git commit -m "test"
git log
----

As expected, commenting out the 2 export lines results in
different commit IDs each time.

Case closed.

Thanks, Jakub!

Jon Forrest


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 19:57   ` Jon Forrest
@ 2016-07-25 15:03     ` Junio C Hamano
  2016-07-25 17:11       ` Junio C Hamano
  2016-07-26 14:26     ` Philip Oakley
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2016-07-25 15:03 UTC (permalink / raw)
  To: Jon Forrest; +Cc: Rodrigo Campos, git

Jon Forrest <nobozo@gmail.com> writes:

> Sure. Take a look at the 2nd or 3rd chapter of Pro Git Reedited, 2nd
> Edition (or just Pro Git 2nd Edition - it doesn't matter). You see
> lots of output showing 'git commit' commands and the commit IDs that
> result. I suspect you'd see the same in almost any book about Git.

I would think that the early-stage learners are better served that
it is the norm, not anything strange, that the commit object name
would be different when you do two identical sequence from scratch.
Forcing them to know GIT_*_DATE variables, just to give them an
impression as if setting of them is part of any normal workflow (or
more importantly, stable commit IDs made by different people at
different times is something expected), is doing them double
disservice, IMHO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-25 15:03     ` Junio C Hamano
@ 2016-07-25 17:11       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2016-07-25 17:11 UTC (permalink / raw)
  To: Jon Forrest; +Cc: Rodrigo Campos, git

Junio C Hamano <gitster@pobox.com> writes:

> Jon Forrest <nobozo@gmail.com> writes:
>
>> Sure. Take a look at the 2nd or 3rd chapter of Pro Git Reedited, 2nd
>> Edition (or just Pro Git 2nd Edition - it doesn't matter). You see
>> lots of output showing 'git commit' commands and the commit IDs that
>> result. I suspect you'd see the same in almost any book about Git.
>
> I would think that the early-stage learners are better served that
> it is the norm, not anything strange, that the commit object name
> would be different when you do two identical sequence from scratch.

Ehh, sent without proofreading.  The above should say "... are
better served if the book taught that it is the norm, ...".

> Forcing them to know GIT_*_DATE variables, just to give them an
> impression as if setting of them is part of any normal workflow (or
> more importantly, stable commit IDs made by different people at
> different times is something expected), is doing them double
> disservice, IMHO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] A Change to Commit IDs Too Ridiculous to Consider?
  2016-07-24 19:57   ` Jon Forrest
  2016-07-25 15:03     ` Junio C Hamano
@ 2016-07-26 14:26     ` Philip Oakley
  1 sibling, 0 replies; 10+ messages in thread
From: Philip Oakley @ 2016-07-26 14:26 UTC (permalink / raw)
  To: Rodrigo Campos, Jon Forrest; +Cc: git

From: "Jon Forrest" <nobozo@gmail.com>
> On 7/24/2016 11:51 AM, Rodrigo Campos wrote:
>> And what is the problem with that, if you are doing it with instructional
>> purposes? Let's assume that this helps and not confuses later when the 
>> commits
>> *do* change. What is the problem you face?
>
> A lot of instructional material contains stuff like "Do [xxx] and you'll
> see [zzz]. If you don't then something went wrong so try to figure out
> what happened and do it again."
>
> Git, as it stands, for good reason doesn't allow this approach.

You may want to look at how the test suite handles the need for well defined 
commit sequences.

It's not something I've really studied, but I am aware of the test_tick to 
increment the time and similar helpers.

There is a big learning step that needs to be got over by many beginners who 
have no concept of a DVCS, nor of multiple master copies (which to most is 
an oxymoron!), nor why the sha is a good solution and serial numbers are a 
bad solution!.

Being able to do a few "Hello World" commits starting at unix t=0, and then 
progressing on to see how they differ when it's unix=now time, or they use 
their own user IDs could be a useful step for those that need it.

>
> I don't think a Git beginner, when using a version of Git that somehow
> works the way I proposed, will be confused. The fact that performing the
> same steps results in the same commit IDs won't be something that
> they'll care about or even notice. The material can include a callout
> mentioning the difference between "real" Git and "learners" Git.
>
>> I mean, for some examples you can use HEAD, HEAD^, HEAD~4, etc. and that 
>> always
>> works, no matter the commit id.
>
> This will work in some cases, but should come later in a Git book.
> But, in many cases using relative commit IDs, rather than absolute,
> will be less clear (I believe).
>
>> In which cases do you want/need the commit ids to be equal?
>> Can you be more specific?
>
> Sure. Take a look at the 2nd or 3rd chapter of Pro Git Reedited, 2nd
> Edition (or just Pro Git 2nd Edition - it doesn't matter). You see
> lots of output showing 'git commit' commands and the commit IDs that
> result. I suspect you'd see the same in almost any book about Git.
>
> Jon
> --
Philip 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-07-26 18:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-24 18:12 [RFC] A Change to Commit IDs Too Ridiculous to Consider? Jon Forrest
2016-07-24 18:46 ` Jakub Narębski
2016-07-24 19:20   ` Jon Forrest
2016-07-24 20:42     ` Jakub Narębski
2016-07-25  3:56       ` Jon Forrest
2016-07-24 18:51 ` Rodrigo Campos
2016-07-24 19:57   ` Jon Forrest
2016-07-25 15:03     ` Junio C Hamano
2016-07-25 17:11       ` Junio C Hamano
2016-07-26 14:26     ` Philip Oakley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.