obsolete index in wt_status_print after pre-commit hook runs

All of lore.kernel.org
 help / color / mirror / Atom feed

* obsolete index in wt_status_print after pre-commit hook runs
@ 2016-07-15 16:34 Andrew Keller
  2016-07-15 17:02 ` Junio C Hamano
  2016-07-15 20:30 ` Andrew Keller
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Keller @ 2016-07-15 16:34 UTC (permalink / raw)
  To: Git List

Hi everyone,

I have observed an interesting scenario.  Here are example reproduction steps:

1. new repository
2. create new pre-commit hook that invokes `git mv one two`
3. touch one
4. git add one
5. git commit

Expected outcome: In the commit message template, I expect to see “Changes to be committed: new file: two"

Found outcome: In the commit message template, I see “Changes to be committed: new file: one"

This behavior seems to be reproducible in versions 2.9.1, 2.8.1, 2.0.0, and 1.6.0.

Skip the next 3 paragraphs if you are in a hurry.

I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
prepare_to_commit work.  It seems that Git already understands that a pre-commit
hook can change the index, and it rereads the index before running the
prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951

During the prepare-commit-msg hook, it seems that the index (according to Git
commands) is correct and up-to-date, but the textual message inside the commit
message template is out-of-date (it references the file `one` as a change to be
committed).

In builtin/commit.c, it seems that the commit message template is rendered
immediately after the pre-commit hook is ran, and immediately before the index is
reread.  If I move the small block of code that rereads the index up, to just after
the pre-commit hook is ran, the commit message template seems to be as I would
expect, both in .git/COMMIT_EDITMSG during the prepare-commit-msg hook and
in the editor for the commit message itself.

I am putting together a 2-patch series that includes a failing test, and then this
change (which fixes the test), but while I do that, I figure I may as well ping the
community to make sure that this behavior is not intentional.  I’d wager that this
change is for the better, but since this behavior has been around so long (I stopped
checking at 1.6.0), it doesn’t hurt to make sure.

Any comments, concerns, or advice?

Thanks,
 - Andrew Keller

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 16:34 obsolete index in wt_status_print after pre-commit hook runs Andrew Keller
@ 2016-07-15 17:02 ` Junio C Hamano
  2016-07-15 17:20   ` Andrew Keller
  2016-07-15 20:30 ` Andrew Keller
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-07-15 17:02 UTC (permalink / raw)
  To: Andrew Keller; +Cc: Git List

Andrew Keller <andrew.keller@covenanteyes.com> writes:

> I have observed an interesting scenario.  Here are example reproduction steps:
>
> 1. new repository
> 2. create new pre-commit hook that invokes `git mv one two`
> 3. touch one
> 4. git add one
> 5. git commit
>
> Expected outcome: In the commit message template, I expect to see
> “Changes to be committed: new file: two"

Expected outcome is an error saying "do not modify the index inside
pre-commit hook", and a rejection.  It was meant as a verification
mechansim (hence it can be bypassed with --no-verify), not as a way
to make changes that the user didn't tell "git commit" to make.

It is just the implementation that dates back to the old days were
too trusting that all users would behave (with its own definition of
"behaving well", which may or may not match your expectation), did
not anticipate that people would try to muck with the contents being
commited in the hook, and did not implement such verification.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 17:02 ` Junio C Hamano
@ 2016-07-15 17:20   ` Andrew Keller
  2016-07-15 17:28     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Keller @ 2016-07-15 17:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On 15.07.2016, at 1:02 nachm., Junio C Hamano <gitster@pobox.com> wrote:

> Expected outcome is an error saying "do not modify the index inside
> pre-commit hook", and a rejection.  It was meant as a verification
> mechansim (hence it can be bypassed with --no-verify), not as a way
> to make changes that the user didn't tell "git commit" to make.

Ah!  Good to know, then.  I’ll rewrite my hook to behave more correctly.

Thanks,
 - Andrew Keller


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 17:20   ` Andrew Keller
@ 2016-07-15 17:28     ` Junio C Hamano
  2016-07-15 17:42       ` Andrew Keller
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-07-15 17:28 UTC (permalink / raw)
  To: Andrew Keller; +Cc: Git List

Andrew Keller <andrew@kellerfarm.com> writes:

> On 15.07.2016, at 1:02 nachm., Junio C Hamano <gitster@pobox.com> wrote:
>
>> Expected outcome is an error saying "do not modify the index inside
>> pre-commit hook", and a rejection.  It was meant as a verification
>> mechansim (hence it can be bypassed with --no-verify), not as a way
>> to make changes that the user didn't tell "git commit" to make.
>
> Ah!  Good to know, then.  I’ll rewrite my hook to behave more correctly.

No problem.

>> It is just the implementation that dates back to the old days were
>> too trusting that all users would behave (with its own definition of
>> "behaving well", which may or may not match your expectation), did
>> not anticipate that people would try to muck with the contents being
>> commited in the hook, and did not implement such verification.

Earlier you said you are working on a patch series.  Since you have
already looked at the codepath already, perhaps you may want to try
a patch series to add the missing error-return instead, if you are
interested?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 17:28     ` Junio C Hamano
@ 2016-07-15 17:42       ` Andrew Keller
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Keller @ 2016-07-15 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On 15.07.2016, at 1:28 nachm., Junio C Hamano <gitster@pobox.com> wrote:

> Earlier you said you are working on a patch series.  Since you have
> already looked at the codepath already, perhaps you may want to try
> a patch series to add the missing error-return instead, if you are
> interested?

Definitely interested — Sounds like a great learning experience.

Thanks,
 - Andrew Keller


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 16:34 obsolete index in wt_status_print after pre-commit hook runs Andrew Keller
  2016-07-15 17:02 ` Junio C Hamano
@ 2016-07-15 20:30 ` Andrew Keller
  2016-07-15 21:19   ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Keller @ 2016-07-15 20:30 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano

Am 15.07.2016 um 12:34 nachm. schrieb Andrew Keller <andrew@kellerfarm.com>:

> I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
> prepare_to_commit work.  It seems that Git already understands that a pre-commit
> hook can change the index, and it rereads the index before running the
> prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951

Quick question: Why does Git reread the index after the pre-commit hook runs?

Thanks,
 - Andrew Keller


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 20:30 ` Andrew Keller
@ 2016-07-15 21:19   ` Junio C Hamano
  2016-07-15 22:03     ` Junio C Hamano
  2016-07-16  2:23     ` Andrew Keller
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-07-15 21:19 UTC (permalink / raw)
  To: Andrew Keller; +Cc: Git List

On Fri, Jul 15, 2016 at 1:30 PM, Andrew Keller <andrew@kellerfarm.com> wrote:
> Am 15.07.2016 um 12:34 nachm. schrieb Andrew Keller <andrew@kellerfarm.com>:
>
>> I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
>> prepare_to_commit work.  It seems that Git already understands that a pre-commit
>> hook can change the index, and it rereads the index before running the
>> prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951
>
> Quick question: Why does Git reread the index after the pre-commit hook runs?

Offhand I do not think of a good reason to do so; does something break
if you took it out?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 21:19   ` Junio C Hamano
@ 2016-07-15 22:03     ` Junio C Hamano
  2016-07-16  2:39       ` Andrew Keller
  2016-08-03 18:25       ` Andrew Keller
  2016-07-16  2:23     ` Andrew Keller
  1 sibling, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-07-15 22:03 UTC (permalink / raw)
  To: Andrew Keller; +Cc: Git List

Junio C Hamano <gitster@pobox.com> writes:

> On Fri, Jul 15, 2016 at 1:30 PM, Andrew Keller <andrew@kellerfarm.com> wrote:
>> Am 15.07.2016 um 12:34 nachm. schrieb Andrew Keller <andrew@kellerfarm.com>:
>>
>>> I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
>>> prepare_to_commit work.  It seems that Git already understands that a pre-commit
>>> hook can change the index, and it rereads the index before running the
>>> prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951
>>
>> Quick question: Why does Git reread the index after the pre-commit hook runs?
>
> Offhand I do not think of a good reason to do so; does something break
> if you took it out?

Ahh, I misremembered.  2888605c (builtin-commit: fix partial-commit
support, 2007-11-18) does consider the possibility that pre-commit
may have modified the index contents after we take control back from
that hook, so that is probably a good place to enumerate what got
changed.  Getting the list before running the hook can give an
out-of-date list, as you said.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 21:19   ` Junio C Hamano
  2016-07-15 22:03     ` Junio C Hamano
@ 2016-07-16  2:23     ` Andrew Keller
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Keller @ 2016-07-16  2:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Am 15.07.2016 um 5:19 nachm. schrieb Junio C Hamano <gitster@pobox.com>:
> 
> On Fri, Jul 15, 2016 at 1:30 PM, Andrew Keller <andrew@kellerfarm.com> wrote:
>> Am 15.07.2016 um 12:34 nachm. schrieb Andrew Keller <andrew@kellerfarm.com>:
>> 
>>> I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
>>> prepare_to_commit work.  It seems that Git already understands that a pre-commit
>>> hook can change the index, and it rereads the index before running the
>>> prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951
>> 
>> Quick question: Why does Git reread the index after the pre-commit hook runs?
> 
> Offhand I do not think of a good reason to do so; does something break
> if you took it out?

According to only test failures, it seems that only the `update_main_cache_tree(0)` invocation
is needed to avoid a torrent of test failures (490 failures across 102 tests).  Removing lines
946, 947, 949, and 950 do not cause test breakages (although my computer is not set up to
run all of the tests).

However, there seems to be an interaction between lines 946-947 and `update_main_cache_tree(0)`
on line 948: although lines 946-947 can be removed by themselves without test breakages,
when 946-948 are all disabled together (and, in turn, lines 949-950 never run), one additional test
failure is registered (t2203.5).

Thanks,
 - Andrew Keller

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 22:03     ` Junio C Hamano
@ 2016-07-16  2:39       ` Andrew Keller
  2016-08-03 18:25       ` Andrew Keller
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Keller @ 2016-07-16  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Am 15.07.2016 um 6:03 nachm. schrieb Junio C Hamano <gitster@pobox.com>:
> Junio C Hamano <gitster@pobox.com> writes:
>> On Fri, Jul 15, 2016 at 1:30 PM, Andrew Keller <andrew@kellerfarm.com> wrote:
>>> Am 15.07.2016 um 12:34 nachm. schrieb Andrew Keller <andrew@kellerfarm.com>:
>>> 
>>>> I pulled out the source for version 2.9.1 and briefly skimmed how run_commit and
>>>> prepare_to_commit work.  It seems that Git already understands that a pre-commit
>>>> hook can change the index, and it rereads the index before running the
>>>> prepare-commit-msg hook: https://github.com/git/git/blob/v2.9.1/builtin/commit.c#L941-L951
>>> 
>>> Quick question: Why does Git reread the index after the pre-commit hook runs?
>> 
>> Offhand I do not think of a good reason to do so; does something break
>> if you took it out?
> 
> Ahh, I misremembered.  2888605c (builtin-commit: fix partial-commit
> support, 2007-11-18) does consider the possibility that pre-commit
> may have modified the index contents after we take control back from
> that hook, so that is probably a good place to enumerate what got
> changed.  Getting the list before running the hook can give an
> out-of-date list, as you said.

Interesting.  So, the implication is that disallowing the pre-commit hook
to change the index may cause some problems (491 problems, if my run
of the tests was accurate).

Does that mean it would be desirable to update the index before the
commit message template is rendered?

Thanks,
 - Andrew Keller


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-07-15 22:03     ` Junio C Hamano
  2016-07-16  2:39       ` Andrew Keller
@ 2016-08-03 18:25       ` Andrew Keller
  2016-08-04 16:45         ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Keller @ 2016-08-03 18:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

Am 15.07.2016 um 6:03 nachm. schrieb Junio C Hamano <gitster@pobox.com>:
> 
> Ahh, I misremembered.  2888605c (builtin-commit: fix partial-commit
> support, 2007-11-18) does consider the possibility that pre-commit
> may have modified the index contents after we take control back from
> that hook, so that is probably a good place to enumerate what got
> changed.  Getting the list before running the hook can give an
> out-of-date list, as you said.

I’ve been experimenting with two different workflows recently:

    (1) Identify problem files during the pre-commit hook;
        when found, fix them automatically in the index and let the commit continue.
    (2) Identify problem files during the pre-commit hook;
        when found, provide instructions to fix the problem (and possibly set a helpful
        Git alias to do it in one command), and abort the commit.  Require that the user fixup
        the index and try the commit again.

And here are my thoughts:

#1 seems to be quick and simple for the user, and it plays (mostly) nice with scripts
and IDEs that do commits autonomously, but I’m having trouble trusting that my
pre-commit hook made the *correct* changes (even though it’s worked nicely so far)
(i.e., I keep looking at the new HEAD commit to make sure it looks right, where
normally I just look at the index and make sure it looks right).

#2 is slightly more difficult to implement just because it has more moving parts,
however I’m finding that because I can interrogate the index after I manually run
the command to make the required changes to the index, and *before* I commit
again, I feel much more confident that I know what is going to be in my commit.
However, this approach doesn’t play well with automated scripts that assume that
a commit operation will always work.

In summary, I think I prefer #2 from a usability point of view, however I’m having
trouble proving that #1 is actually *bad* and should be disallowed.

Any thoughts?  Would it be better for the pre-commit hook to be officially allowed to
edit the index [1], or would it be better for the pre-commit hook to explicitly *not* be
allowed to edit the index [2], or would it be yet even better to simply leave it as it is?

[1] and possibly create a patch that teaches builtin/commit.c to reread the index
    after the pre-commit hook runs and before rendering the commit message template
[2] and possibly create a patch that teaches builtin/commit.c to detect changes to the
    index after the pre-commit hook runs

Thanks,
 - Andrew Keller

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-08-03 18:25       ` Andrew Keller
@ 2016-08-04 16:45         ` Junio C Hamano
  2016-08-05 13:22           ` Andrew Keller
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-08-04 16:45 UTC (permalink / raw)
  To: Andrew Keller; +Cc: Git List

Andrew Keller <andrew@kellerfarm.com> writes:

> In summary, I think I prefer #2 from a usability point of view, however I’m having
> trouble proving that #1 is actually *bad* and should be disallowed.

Yeah, I agree with your argument from the usability and safety point
of view.

> Any thoughts?  Would it be better for the pre-commit hook to be
> officially allowed to edit the index [1], or would it be better
> for the pre-commit hook to explicitly *not* be allowed to edit the
> index [2], or would it be yet even better to simply leave it as it
> is?

It is clear that our stance has been the third one so far.

Another thing I did not see in your analysis is what happens if the
user is doing a partial commit, and how the changes made by
pre-commit hook is propagated back to the main index and the working
tree.

The HEAD may have a file with contents in the "original" state, the
index may have the file with "update 1", and the working tree file
may have it with "update 2".  After the commit is made, the user
will continue working from a state where the HEAD and the index have
"update 1", and the working tree has "update 2".  "git diff file"
output before and after the commit will be identical (i.e. the
difference between "update 1" and "update 2") as expected.

If pre-commit were allowed to munge the index to have the file in
the "update 3" state, the resulting commit would have that version
of the file in its tree.  By definition, "update 1" and "update 3"
are different (that is what it means to allow pre-commit to munge
the index); where should the differences between "update 1" and
"update 3" go?  It is clear that pre-commit thought that the
contents in the "update 1" state is bad and "update 3" state is
better (that is why it made that fix), so after the commit is made,
we would want to have "update 3" in the index.  But what would you
do to the working tree file, which is in "update 2" state?  If you
do not do anything, "git diff" would show the remaining edit the
user had before starting the commit (i.e. difference between "update
1" and "update 2") plus a reversion of the edit pre-commit made
because what the working tree has, "update 2", is based on "update 1"
and has never heard of the change pre-commit did.

But leaving the working tree file as-is is the only safe choice, as
I do not think we want "git commit" to _create_ new conflict in the
working tree by attempting to merge (we _could_, and implementing it
would be a trivial thing to do by calling ll_merge() to three-way
merge "update 2" and "update 3" that are both based on "update 1",
but the result from the end-user's point of view is too _weird_).

So, I tend to think we should not allow pre-commit to munge the
index.  We should be able to detect fairly cheaply if pre-commit
munged the index by remembering the trailing SHA-1 of the index file
given to the pre-commit hook before running it, and reading the
trailing SHA-1 of the index file left after the pre-commit hook and
comparing them.  And we would yell at the user that his pre-commit
munged the index and abort.

Or something like that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: obsolete index in wt_status_print after pre-commit hook runs
  2016-08-04 16:45         ` Junio C Hamano
@ 2016-08-05 13:22           ` Andrew Keller
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Keller @ 2016-08-05 13:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List


Am 04.08.2016 um 12:45 nachm. schrieb Junio C Hamano <gitster@pobox.com>:
> Andrew Keller <andrew@kellerfarm.com> writes:
> 
>> In summary, I think I prefer #2 from a usability point of view, however I’m having
>> trouble proving that #1 is actually *bad* and should be disallowed.
> 
> Yeah, I agree with your argument from the usability and safety point
> of view.
> 
>> Any thoughts?  Would it be better for the pre-commit hook to be
>> officially allowed to edit the index [1], or would it be better
>> for the pre-commit hook to explicitly *not* be allowed to edit the
>> index [2], or would it be yet even better to simply leave it as it
>> is?
> 
> It is clear that our stance has been the third one so far.
> 
> Another thing I did not see in your analysis is what happens if the
> user is doing a partial commit, and how the changes made by
> pre-commit hook is propagated back to the main index and the working
> tree.
> 
> The HEAD may have a file with contents in the "original" state, the
> index may have the file with "update 1", and the working tree file
> may have it with "update 2".  After the commit is made, the user
> will continue working from a state where the HEAD and the index have
> "update 1", and the working tree has "update 2".  "git diff file"
> output before and after the commit will be identical (i.e. the
> difference between "update 1" and "update 2") as expected.

Excellent point — one I had discovered myself but neglected to include in
my email.  In my post-commit hook, I have logic in both versions of my
experiment that disallows [1] fixing up diffs that are partially staged.  Both
scripts then update both the index and the working copy.  (Sort of like how
rebase works — clean working directory required, and then it updates the
index and the work tree)

[1] In version #1, if any files it wants to change are partially staged, it
    prints a detailed error message and aborts the commit outright.  In
    version #2, the pre-commit hook sees the change it _wants_ to make,
    informs the user that he/she should run the fixup command, aborts
    the commit, and when the user runs the fixup command, the fixup
    command sees the partially staged file, prints the same detailed error
    message, and dies.

Thanks for your help on this.  it’s really been interesting.  I’ll leave it as-is
for now.

Thanks,
 - Andrew Keller


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-08-05 13:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-15 16:34 obsolete index in wt_status_print after pre-commit hook runs Andrew Keller
2016-07-15 17:02 ` Junio C Hamano
2016-07-15 17:20   ` Andrew Keller
2016-07-15 17:28     ` Junio C Hamano
2016-07-15 17:42       ` Andrew Keller
2016-07-15 20:30 ` Andrew Keller
2016-07-15 21:19   ` Junio C Hamano
2016-07-15 22:03     ` Junio C Hamano
2016-07-16  2:39       ` Andrew Keller
2016-08-03 18:25       ` Andrew Keller
2016-08-04 16:45         ` Junio C Hamano
2016-08-05 13:22           ` Andrew Keller
2016-07-16  2:23     ` Andrew Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.