Re: [PATCH v2] hooks: propose project configured hooks

From: Albert Cui <albertqcui@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Albert Cui via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH v2] hooks: propose project configured hooks
Date: Fri, 2 Apr 2021 17:58:29 -0700	[thread overview]
Message-ID: <CAMbkP-Q4_z-nQzJwr2ZeM16deHKmzr=Z4908UzgOyk9FA-gKEA@mail.gmail.com> (raw)
In-Reply-To: <87zgyhj7vr.fsf@evledraar.gmail.com>

On Fri, Apr 2, 2021 at 3:30 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> I had comments on the v1 that are still unanswered by this re-roll:
> https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/
>
> I think a more productive way to handle such proposals is to reply to
> such general discussion and /then/ re-roll the series.
>
> I'm not going to repeat the outstanding points there (but would like you
> to reply to it, and having just read Derrick Stolee's E-Mail downthread
> there's another significant point of feedback to reply to.
>
Thanks for the feedback! I'm new to this process. I'll make sure to
reply to both.

[...]

> >      ++  ** The project may want to prevent developers from committing passwords or
> >      ++  other sensitive information e.g. via
> >      ++  https://github.com/awslabs/git-secrets[git-secrets].
>
> Why does the project want to prevent developers from *committing* such
> information by default? I commit such stuff all the time, it's only a
> problem once it's pushed.
>
> But I think this is answered below:
>
> >      ++Server-side vs Local Checks
> >      ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      ++
> >      ++A large motivation for this change is providing projects a method to enable
> >      ++local checks of code e.g. linting. As documented in
> >      ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> >      ++checks may be more appropriate, especially since developers can always skip
> >      ++local hook execution. We think there are still benefits to local checks:
> >      ++
> >      ++* Ensuring commit message style and preventing the committing of sensitive data,
> >      ++as described above. In particular, if CI results are public, as with many open
> >      ++source projects, server side checks are useless for guarding against leaking
> >      ++sensitive data.
>
> So what you really mean is you want a pre-commit hook as an alternative
> to some "once we've accpted the push and CI runs we notice naughty
> data", not as a pre-receive hook alternative?
>

I think you're identifying that "prevent developers from commiting" is
the wrong wording.
Maybe a better phrasing is "prevent sensitive information from getting pushed or
checked in."

Yes, this could be done in CI, but as noted below, catching this
before the push (or commit)
happens is more productive for developers:

> >      ++
> >      ++* Helps developers catch issues earlier: typically developers need to push to
> >      ++the remote to trigger server-side checks. Local hooks can be run anytime the
> >      ++developer wants. This is especially useful if the project has slow
> >      ++server-checks; catching issues locally can save the developer a lot of time
> >      ++waiting for CI. They are also useful for locally reproducing an issue identified
> >      ++in CI, helping resolve issues faster.
> >      ++

An additional point I should also call out is that for large
repositories, pushes themselves
can be slow, so even if server-side checks are fast, being able to
avoid unnecessary pushes
can help developer velocity.

>
> This all makes sense, but is really missing how this problem isn't adequately solved by:
>
>     $ HACKING
>     Welcome to project XYZ, first you'll be much more productive with
>     our hooks, run:
>
>         ./setup-hooks.sh
>
>     [...]
>
> Presumably developers working on some central repo are the ones least
> served by this type of thing, since such setups usually have established
> setup scripts etc. that you (mostly) go through once.
>

One issue that immediately comes to mind with a setup script is that
developers could
easily miss out on updates to the hooks. One nice thing about this
proposal is that we
try to address that problem.

Anothing issue is that people in general are bad at reading
instructions; that's why I'm
trying to get as close as possible to `git clone` doing set up for you
while balancing the
security concerns (I know Derrick Stolee finds issue with this in his
reply, which I'll try to
address in my reply there.)

> >      ++In the ideal world, developers and project maintainers use both local and server
> >      ++side checks in their workflow. However, for many smaller projects, this may not
> >      ++be possible: CI may be too expensive to run or configure. The number of local
> >      ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> >      ++Bringing this natively to Git can give all these developers a well-supported,
> >      ++secure implementation opposed to the fragmentation we see today.
>
> The end-goal of this series combined with Emily's configurable hook is
> basically to have less friction when it comes to setting up and
> maintaining hooks.
>
> I don't see how it would help with "CI may be too expensive to run or
> configure" though. We're basically talking about auto-updating a script
> in .git/hooks, which today is essentially a ./setup-hooks.sh, and the
> script checking for a new version of itself when it runs at
> origin/myscripts:myname.sh, no?
>

By "expensive", I mean from both a money and a time perspective: for small
projects, you may not set up any server-side checks and instead rely
purely on local checks,
and this helps by, as you said, reducing the friction to set up these
local checks.

From my own experience: When I start a new weekend project, setting up
CI is not at the top
of the list of things I want to spend time on; I'm usually not even
writing tests. Maybe
I'm just a bad developer :D But I do set up local checks: linting,
code formatters.

To your point about updating hooks (we're thinking on the same
lines!): that's putting the
responsibility on the developer to implement. From a
maximizing-global-productivity
point of view, isn't it more elegant for us to extend Git's
functionality and provide good support
for this use case?

> >      -+* As a repository owner / project maintainer,
> >      ++* As a project maintainer,
> >       +
> >       +    ** I want to enforce code style so I require developers to use a
> >       +    standardized linter.
> >       +
> >      -+    ** I want to prevent leaks / share code widely so I check that developers
> >      -+    are uploading code to the right place.
> >      ++    ** I want to prevent sensitive data from getting checked in.
> >       +
> >      -+    ** I want this to just work for all the developers in my repository, without
> >      ++    ** I want to prevent leaks so I check that developers are uploading code to
> >      ++    the right private central repository. Conversely, I may want to encourage
> >      ++    more open source development and encourage developers to push to public
> >      ++    central repositories.
>
> I think I'm beginning to get the gist of this, so it's really also a
> proposed workaround for projects that host on platforms like github.com
> and whatnot where you can run arbitrary code in a CI, but you can't
> install a custom pre-receive hook?
>
> I think it might be worth explicitly spelling that out. E.g. if those
> platforms gained a feature (which isn't that hard to imagine) of hiding
> your ref until after some pre-receive part of a CI check has run (which
> would not have public logs, so you could "safely" check/push passwords)
> it seems to me that a large part of the immediate need for this would go
> away.
>
> Which doesn't per-se mean we shouldn't do it, just that assumptions,
> constaints etc. should be documented.
>

Agree we can call this out, but as I'm saying above, I don't think
it's fair to assume
everyone is already using server-side checks and there's still benefit
to doing the same
checks locally even if you have server-side checks.

[...]

> Snip the rest of the doc, which as noted I've god unreplied-to feedback
> on in https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

Really appreciate the engagement! I'll try to reply early next week.