Re: [PATCH v2] hooks: propose project configured hooks

From: Albert Cui <albertqcui@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Albert Cui via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] hooks: propose project configured hooks
Date: Mon, 5 Apr 2021 15:45:10 -0700	[thread overview]
Message-ID: <CAMbkP-S-9cccMpU4HG0Wurqap-WkTmD2zk50nKd9kJ_oWO__qw@mail.gmail.com> (raw)
In-Reply-To: <ec031dc8-e100-725b-5f27-d3007c55be87@gmail.com>

On Tue, Mar 30, 2021 at 8:24 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote:
> > From: Albert Cui <albertqcui@gmail.com>
> >
> > Hooks today are configured at the repository level, making it difficult to
> > share hooks across repositories. Configuration-based hook management, by
> > moving hooks configuration to the config, makes this much easier. However,
> > there is still no good way for project maintainers to encourage or enforce
> > adoption of specific hook commands on specific hook events in a repository.
> > As such, there are many tools that provide this functionality on top of Git.
> >
> > This patch documents the requirements we propose for this feature as well as
> > a design sketch for implementation.
>
> Sorry for being so late in reviewing this.
>
> My first reaction is that this feature is suggesting multiple security
> vulnerabilities as core functionality. It also seems to be tied to
> niche projects (in number of projects, not necessarily the size of those
> projects).
>

According to it's GitHub page, Husky [0] has 22M million monthly downloads and
is used by 437K projects. Many projects, both big (e.g. Android [1],
ChromeOS [2],
Angular [3], Webpack [4]) and small have use cases around hooks configuration,
so I don't think it's fair to say this is only needed by niche projects.

> I was recommended in conversation to think of this as a way to take
> existing ad-hoc behavior and standardize it with a "Git-blessed"
> solution. I'm not sure this proposal makes a strong enough case for
> why having a "configure-hooks.sh" script in the base of the repo is
> not enough. It simultaneously does not use existing precedents like
> .gitattributes or .gitignore as direction in using the worktree at
> HEAD as a mechanism for communicating details. I find using a separate
> ref for hooks to be a non-starter and the design should be rebuilt from
> scratch.
>

Right, this entire proposal is trying to get to a "Git-blessed" solution,
and I should make the need clearer. A few reasons for standardizing
this come to mind:

1. Many existing "standard" solutions. A multitude of existing solutions for
this use case speaks to the fact that a basic config script is not sufficient.
I mentioned Husky above, but here are a few more; basically each
popular programming language environment has a solution for this.

https://github.com/sds/overcommit - Ruby
https://github.com/pre-commit/pre-commit - Python
https://github.com/Arkweid/lefthook - Go
https://github.com/shibapm/Komondor - Swift
https://github.com/typicode/husky - Node

These solutions all handle the installation and updating of hooks. A
"configure-hooks.sh" script doesn't handle hook updates, unless you go through
the trouble yourself of implementing and maintaining that.

2. External dependencies. The existing solutions require users to
either have those
toolchains already installed or an explicit installation step via an OS package
manager, so there's a lot of friction for users, even when using these
standard solutions.
This functionality shouldn't require external dependencies, and most
certainly _how_ you
set up hooks shouldn't change depending on your coding environment.
Fixing this is only
possible in a world where Git supports this functionality natively.

3. Improving security. As you mentioned, hooks are difficult to get
right from a security
perspective, and standardizing on a single implementation allows us to
give developers
a well-vetted solution with a better security model than what exists
today. For example,
we're proposing making it very clear to users whenever there's a hook
update. This isn't
something that existing solutions do.

I'll also say in general, the Git project is much more likely to get
security right than smaller
projects, where oftentimes even popular projects end up unmaintained.

For the design feedback: what I'm hearing is that you'd prefer a
design that keeps configuration in the worktree. I'll leave discussion about
implementation to those on the list better suited :)

However, one thing that I should have mentioned in patch about the design
proposal (as Emily Shaffer touched upon in her response) is that since the
configuration is separate from the code base, it allows you to go back
in history
or to older branches while preserving "improvements" to the hooks configuration
e.g. maybe the project has shifted to using a newer version of a linter.

This functionality has pros and cons. Bringing improvements to checks
can be useful
e.g. it's arguably better to bring older code up to newer code style.
But it also makes
it less possible to replicate the exact state of older history. This
is a problem that
server-side checks also have.

> I also expect that a significant portion of users will see a message
> like "this repository needs hooks" and will just say "yes" to get rid
> of the prompt. There needs to be sufficient opportunity for users to
> inspect the hook configuration and avoid frustrated or distracted users
> from doing the wrong thing.
>
> Server-side checks should always exist, so users who don't follow the
> project's guidelines using the recommended hooks will be blocked. The
> important thing is that there is an easy way for willing participants
> to install the correct hooks. This doesn't mean we should make it
> almost automatic.
>

What I'm hearing here is that you don't like the interactive prompt, as you
noted in the UX feedback below. Allowing users the chance to inspect the hook
configuration makes sense to call out explicitly as part of the security model.

> Also, please proactively pursue a security review of the feature,
> including non-technical risks such as social engineering, forks, or
> other possible attacks. This idea seems so risky that I would be
> against accepting it unless a security expert has done a thorough
> review.
>

Agreed. We already did a security review internally at Google. The main
feedback was:

* We need an explicit opt-in opposed to setting hooks up automatically,
e.g. a command line flag like --accept-hooks at minimum. This is primarily
to distinguish people who are just cloning a repository to browse the code
from people who are developing.

* The average user doesn't have the ability to review hooks in general
(security is hard and obscuration is easy), and if the user has
already opted into
this feature because they are engaged in development, it's very likely
that they're
already running build scripts, so the additional attack vector here doesn't seem
like a big issue.

If there are other security folks I should seek advice from, please let me know.

[...]
> I think providing a way for repository owners to _recommend_ how cloners
> should interact with the repository is a good idea. I think starting with
> hooks is perhaps a significant jump to the most complicated version of
> that idea.
>
> As you think of this design, it might be good to think about how some
> recommended Git config (within an allow-list) might fit into this system
> as well. I would have started there, with things like "Use partial clone"
> or "use sparse-checkout". Those are really things that need to happen at
> clone time, they can't really happen after-the-fact, which helps justifying
> a modification to 'git clone'. The hook configuration doesn't _need_ to
> happen during 'git clone'. More on this timing later.
>
> The .gitattributes file is the closest analogue I could find in current
> functionality, but it operates on a path-based scope, not repository scope.

I am happy to extend this patch to talk about configuration in general with
a specific use case of suggesting hooks. However, I do want to separate out
"what should we do when?" from "should we do this?" Based on this feedback,
I'm hearing "we should have a design for suggested configuration in general,"
but I don't think that necessitates that we should implement, for
example, partial
clone configuration before hooks configuration.

>
> > +Server-side vs Local Checks
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
> > +In the ideal world, developers and project maintainers use both local and server
> > +side checks in their workflow. However, for many smaller projects, this may not
> > +be possible: CI may be too expensive to run or configure. The number of local
> > +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> > +Bringing this natively to Git can give all these developers a well-supported,
> > +secure implementation opposed to the fragmentation we see today.
>
> I'm not sure this is a good selling point for small projects. If they
> are small, then the CI to verify commits is cheap(er).
>
> Local hooks should never be used as a replacement for server-side checks.
> A user could always use a repository without the local hooks and push
> commits that have not been vetted locally. The extreme example is to have
> a commit hook that compiles the code and runs all the tests. Would you
> then remove all CI builds?
>
> Making it easier to adopt local hooks can avoid some pain points when
> users are blocked by the server-side checks.
>

Sorry, this wasn't meant to come off as "small projects don't need CI." It was
more to say, "small projects today are already primarily relying on
local checks,
so we should make that easier for them." I clarified that a bit here as well:
https://lore.kernel.org/git/CAMbkP-Q4_z-nQzJwr2ZeM16deHKmzr=Z4908UzgOyk9FA-gKEA@mail.gmail.com/T/#u

[...]

> I appreciate the motivation in this document. However, the motivation
> doesn't really justify why this should be baked into Git itself, since
> a "configure-repo" script in the base of the repo would suffice to
> achieve that functionality.
>
> The reason to put this in Git is to standardize this process so it
> is not different in each repository. It might be good to spend time
> justifying that angle.
>

Thank you for the feedback. I hope what I wrote above helps.

[...]
> > +* Trust comes from the central repository:
> > +  ** Most users don't have the time or expertise to properly audit every hook
> > +  and what it does. There must be trust between the user and the remote that the
> > +  code came from, and the Git project should ensure trust to the degree it can
> > +  e.g. enforce HTTPS for its integrity guarantees.
> > +
> > +  ** Since developers will likely build their local clone in their development
> > +  process, at some point, arbitrary code from the repository will be executed.
> > +  In this sense, hooks _with user consent_ do not introduce a new attack surface.
>
> It is critical that users are presented with this consent at the correct
> times. For instance, I believe configuring local hooks should only be
> done _after_ "git clone" completes. That allows a user to inspect the
> worktree to their content instead of in the middle of an interactive shell
> session or something. (The "git clone" command could output a message to
> stderr saying "This repository recommends configuring local hooks. Run
> 'git <tbd>' to inspect the hooks and configure them.")
>
> We've had enough code-execution bugs with "git clone" that I want to
> completely avoid that possibility here.
>

Responding to this below in the "Consent through clone" section.

> > +* Give users visibility: Git must allow users to make informed decisions. This
> > +means surfacing essential information to the user in a visible manner e.g. what
> > +remotes the hooks are coming from, whether the hooks have changed in the latest
> > +checkout.
>
> As a user moves HEAD, we should similarly avoid updating the hooks
> automatically, but instead present a message to the user to update their
> hooks using an intentional command.
>

Responding to this below.

> > +    ** This could be a path to a script/binary within the repository
>
> Binaries will be tricky if you want users of multiple platforms to
> interact with your repository. And scripts can be slower than binaries.
>
> How could someone build hooks from source using your workflow? Perhaps
> users are expected to locally compile the code before configuring the
> hooks?
>

"Binaries" was primarily referring to packages like pylint which are installed
at the OS level. In an enterprise environment, these could be
installed automatically
for the user.

> > +    ** This could be a path to a script/binary contained within submodules of
> > +    the repository
>
> This gives me significant chills. Proceed with caution here.
>
> I understand the reason to want this feature: you could have a suite
> of repositories using a common hook set that lives in each as a
> submodule. I just want to point out that this adds yet another
> dimension for attack.
>

If we notify users about changes, both to hook commands and to the
underlying source files, we can make this safer.

> > +    ** This could be a user installed command or script/binary that exists
> > +    outside of the repository and is present in `$PATH`
>
> Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without
> elevation. It could help here to clarify the intended user pattern here:
> "This repository requires that you install tool X." This seems unlikely
> to be necessarily true at clone time, so the users will have a broken
> state if they don't run some extra steps. How will that be communicated?
>
> Requirements like these make me think that these repositories would be
> better off with a script that configures the hooks after checking if
> these things actually exist on the PATH (and installs them if not). I
> would lower the priority of this one for now.
>

As mentioned, for enterprise deployments, this can be solved by administrators
installing any necessary software automatically.

Otherwise, I think ensuring the tool is installed feels out-of-scope
(as written in
the doc); it's not like Git makes sure compilers or build tools are
installed today,
and even today, users could set up Husky hooks that rely on $PATH tools, so
we're not introducing a new problem.

Some day, maybe we could have a `post-clone` and `post-checkout` hooks,
but I think this is all out-of-scope of the immediate discussion.

> > +* This configuration should only apply if it was received over HTTPS
>
> Avoiding http:// and git:// makes sense. Why not SSH?
>

Left off SSH accidentally! Definitely makes sense to include.

> > +* A setup command for users to set up hooks
> > +
> > +    ** Hook setup could happen at clone time assuming the user has consented
> > +    e.g. if `--setup-hooks` is passed to `git clone`
>
> This is not enough consent.
>

I hear what you're saying. How would you feel if this specific
functionality was behind
config (that enterprise environments could ship)? What I'm thinking:

```
#~/.gitconfig
[hook]
   allowCloneInstallFromRemote = $REMOTE
```

IFF $REMOTE matches the config, then `git clone $REMOTE --setup-hooks` works.

(We could even get rid of --setup-hooks and rename the config to
hook.installOnCloneFromRemote but I think requiring user consent is
still best here.)

> > +* Users must explicitly approve hooks at least once
> > +
> > +    ** Running the setup command should count as approval, including if the user
> > +    consented during the clone
> > +
> > +    ** When a hook command changes, a user should re-approve execution (note:
> > +    implementation should not interfere with requirement listed in “Fast
> > +    Follows")
>
> Users should explicitly approve hooks any time they would change.
> They should also be able to explore the source of the change using
> whatever editors and tools they want, so the worktree should change
> to its new state without new hooks, _then_ the user could consider
> updating hooks based on that new state.
>

As mentioned above, developers will execute code locally anyway when building.
Do you think allowing users to opt-into hook updates significantly
increases the attack
surface? I should note that existing solutions already do hook updates
automatically and
silently (since they use a trampoline script); this proposal is safer
from that given
1) we ask users to opt into auto updates and 2) we warn users about changes.

> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> I don't understand what this means. "when prompted to execute a hook" are you
> saying that the user will get a message saying "Git will now run the pre-commit
> hook, are you ok with that?"
>

This should say "When prompted to install a hook, users can specify always or
never, even if the hook updates. For security, this consent should be tied to
a specific remote."

[trimming some duplicate points]

> > +# instead of prompting, we could give users commands to run instead
> > +# see case 3
> > +
> > +Do you wish to install them?
> > +1. Yes (this time)
> > +2. Yes (always from origin)
> > +3. No (not this time)
> > +4. No (never)
>
> I'd rather see the installation as a separate step. That gives more
> weight to the users' consent. Even if you do have a prompt here that
> says Yes/No, *do not* include "always from origin".
>
> > +===== Case 3: Re-prompting when hooks change
> > +....
> > +$ git pull
> > +
> > +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> > +
> > +pre-push:  $GIT_ROOT/pre_push.sh
> > +
> > +If you wish to install them, run `git hook setup origin`.
>
> Good. Stop here. Perhaps also describe this as something that happens
> with "git checkout" because it matters when HEAD updates, even if the
> commit was fetched earlier.
>

Sure, case 3 seems like a reasonable path to pursue. Sounds like auto-updating
is the main point of contention here.

[...]

> > +Implementation Sketch
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +* Perform fetch as normal
> > +
> > +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> > ++origin/refs/recommended-config+) which contains information about config lines
> > +an end-user may want (including hooks).
>
> I think this is the wrong direction to go. You are recommending a few things:
>
> 1. Some branch names are more special than others.
> 2. Hooks live in a separate history than the rest of the repository.
> 3. Users cannot inspect the hooks in their worktree before installation.
>
> Instead, think about things like .gitignore and .gitattributes, as they can
> change as the repository changes. Make a special _filename_ or directory:
> for example ".githooks/".
>

#2 is a feature, as mentioned above, but certainly just a
"nice-to-have." I agree we'd
need a solution to #3.

Thanks for all the feedback!

Albert

[0] https://github.com/typicode/husky
[1] https://android.googlesource.com/platform/tools/repohooks/
[2] https://chromium.googlesource.com/chromiumos/docs/+/HEAD/contributing.md#Upload-changes
[3] https://github.com/angular/angular/tree/master/.husky
[4] https://github.com/webpack/webpack/tree/master/.husky