Re: Issue when changing staged files in a pre-commit hook

From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Niek van der Kooy <niekvanderkooy@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Issue when changing staged files in a pre-commit hook
Date: Tue, 19 Jan 2016 17:44:22 -0800	[thread overview]
Message-ID: <xmqqk2n56n49.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <CACsJy8Aj+_Qra312EWRsjmebd67EFQt4RP4LWFUyXUB-QCnKvQ@mail.gmail.com> (Duy Nguyen's message of "Wed, 20 Jan 2016 07:23:19 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Jan 20, 2016 at 6:41 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> The only sensible thing we can do at that point in the code after
>> re-reading the index is to make sure that hasn't been changed by the
>> pre-commit hook and complain loudly to die if we find it modified, I
>> think.
>
> OK. Two more points
>
>  - do we catch index changes for partial commit case only? Because
> when $GIT_DIR/index is used, we do not have this problem.

I do not think you are correct.  As-is commit with un-added changes
in the working tree would have the same problem.  If the hook munged
the real index without touching the working tree, the next "commit
-a" would revert that change unless you reflect the same munging the
hook did to the working tree.

So if you _really_ wanted to be lenient and allow hooks to muck with
the index, you would have to do something along this line:

 - Take a snapshot of the working tree files (call it W0).
 - Prepare the false-index for partial commit if necessary.
 - Write-tree the index used to record the next commit (call the result T0)
   before calling the hook and showing that index to it.
 - Let the hook munge the index (and possibly the working tree.
 - Re-read the index and write-tree (call the result T1).
 - Take a snapshot of the working tree files (call it W1).
 - If T1 != T0:
     - If making a partial commit
       - apply diff(T0,T1) to the real index.
       - apply diff(T0,T1) - diff(W0,W1) to the working tree files.
     - Otherwise
       - apply diff(T0,T1) - diff(W0,W1) to the working tree files.

 And if any of the "apply the change" fails, fail the whole thing
 without leaving any damage.  As the index and the HEAD after making
 a partial commit is allowed to be different, we cannot resolve the
 3-way merge conflict using the index the usual way.

 And if all of the "apply the change" succeeds (including the easier
 cases where W0 == W1), allow such a change by the hook.

That would give us a more lenient system, but I do not necessarily
think that it would be a good thing.

Think what the error message has to say when "if any of the apply
fails" case.  "Commit failed because your hook munged X in such and
such way" (where X might be the index, the working tree, or both,
and "such and such way" would describe the overlapping changes that
cannot be automatically reconciled).  What is the impression such a
message give the user?  A hook is allowed to muck with the index
only under some complex condition but not always?  Surely there is a
solid technical reason why it cannot be allowed, but is that a rule
that is easy to understand by end users?

Wouldn't it be simpler if the rule the user has to remember is
"pre-X hook are designed to inspect and reject, not to tweak and
munge" (which has always been the case as far as I am concerned, and
the fact that the code trusted hooks too much to follow the rule and
did not verify was merely a bug) and rejected a hook that munged the
index when it did so, regardless of the state of the index and the
working tree when the commit command is run and the kind of commit
(either as-is or partial) that we are creating?

>  - is Niek's use case still valid? If so, what do we do to support it
> (at least in partial commit case)?

Especially in partial commit case, but more in general in any case,
I'd say the result is "undefined".

I think the _only_ case we can safely port forward the munging done
by the hook is during "git commit -a".  In that case, we know that
the working tree exactly matches the index being committed before
the hook runs, and after the hook munged the index, regardless of
what it did (or didn't) do to the working tree, the resulting HEAD,
the index, and the working tree ought to match, so porting forward
is just a matter of running "reset --hard" when we notice that the
hook munged the contents to be committed with "commit -a".

All other cases can create conflicts that we cannot express to the
user to resolve, and will introuce "hook is allowed to muck with the
index only under some complex condition but not always" confusion.

The above does not have to be the final verdict, though.  Somebody
else may able to come up with a more lenient behaviour that is easy
to understand by end users (I somehow doubt about the latter; coming
up with "a more lenient" behaviour is easy, and anybody who hacks up
such a behaviour may argue that the behaviour is easy to understand,
but that is merely due to knowing the implementation--explaining it
to the end users is a different matter).